Recei
v
ed
17
J
ul
y
2024
,
accep
t
ed
9
A
ugus
t
2024
,
da
t
e
o
f
publica
t
ion
19
A
ugus
t
2024
,
da
t
e
o
f
cu
rr
en
t
v
e
r
sion
27
S
ep
t
embe
r
2024
.
Di
g
i
t
a
l
O
b
j
e
ct
I
d
e
n
t
ifi
e
r 10
.
1109
/
A
CC
ESS.
2024
.
3445413
Larg
e
La
n
g
u
ag
e
Mode
l
s a
nd
S
ent
i
ment
A
n
a
l
ys
i
s
i
n
F
i
n
a
nc
i
a
l
M
ark
et
s
:
A R
e
v
i
e
w
,
Da
t
as
et
s
,
a
nd
Cas
e
S
tud
y
C
H
ENG
H
AO
LI
U
1
,
AR
U
N
K
UM
AR
AR
U
L
A
PP
A
N
2
, (
M
embe
r
, I
EEE
),
RANE
SH
NA
H
A
3
, (
M
embe
r
, I
EEE
),
AN
IK
E
T
M
A
H
AN
T
I
1
,
4
, (S
en
i
o
r
M
embe
r
, I
EEE
),
JOARDER
K
A
M
R
U
ZZ
A
M
A
N
5
, (S
en
i
o
r
M
embe
r
, I
EEE
),
AND
I
N
-H
O
R
A
6
, (
M
embe
r
, I
EEE
)
1
School of Computer Science, The University of Auckland, Auckland 1010, New Zealand
2
School of Computer Science Engineering and Information Systems, VIT University, Vellore 632014, India
3
School of Information Systems, Queensland University of Technology, Brisbane, QLD 4000, Australia
4
Department of Computer Science, University of New Brunswick, Saint John, NB E2K 5E2, Canada
5
Centre
for Smart Analytics, Federation University Australia, Melbourne, VIC 3806, Australia
6
School of Software, Kunsan National University, Gunsan 54150, South Korea
Corresponding author: Arunkumar Arulappan (arunkumar.a@vit.ac.in)
This work was supported by the School of Computer Science Engineering and Information Systems, Vellore Institute of Technology.
A
BS
T
RA
CT
This paper comprehensively examines Large Language Models (LLMs) in sentiment analysis,
speci
f
i
cally focusing on
f
i
nancial markets and exploring the correlation between news sentiment and Bitcoin
prices. We systematically categorize various LLMs used in
f
i
nancial sentiment analysis, highlighting their
unique applications and features. We also investigate the methodologies for effective data collection and
categorization, underscoring the need for diverse and comprehensive datasets. Our research features a case
study investigating the correlation between news sentiment and Bitcoin prices, utilizing advanced sentiment
analysisand
f
i
nancialanalysismethodstodemonstratethepracticalapplicationofLLMs.The
f
i
ndingsreveal a
modest but discernible correlation between news sentiment and Bitcoin price
f
l
uctuations, with historical
newspatternsshowingamoresubstantialimpactonBitcoin’slonger-termpricethanimmediatenewsevents.
This highlights LLMs’ potential in market trend prediction and informed investment decision-making.
I
NDE
X
TE
R
M
S
Large language model, Bitcoin price, sentiment analysis, machine learning, market
dynamics.
I
.
I
NT
R
ODUCT
I
ON
Sentiment analysis (SA) in
f
i
nancial markets has emerged as
a critical study area, particularly given its widespread
application in speci
f
i
c sectors like the stock market [1],
[2], [3], [4]. This analytic approach primarily aims to
discern individuals’ attitudes, evaluations, and opinions
regarding various entities and products. In this context,
behavioral economics becomes pertinent as it delves into
the psychological aspects of investor behaviors, considering
the in
f
l
uence of social, cultural, and emotional factors on
The associate editor coordinating the review of this manuscript and
approving it for publication was Seifedine Kadr
y .
decision-making processes [5]. These factors often play a
signi
f
i
cant role in explaining market anomalies [6].
Furthermore, the sentiment expressed in news, especially
those covering political, social, economic, or emotional
events disseminated through social media, profoundly in
f
l
u-
ences investor behavior [7], [8]. As a result, information
sourced from online newsgroups, social networks, and stock
discussion forums has become increasingly valuable for
informed business decision-making. Recently, a signi
f
i
cant
amount of research has been carried out by fundamentally
analyzing unstructured text data through machine learning
involving supervised and unsupervised learning methods.
LLMs emerged due to large-scale data and increased
2024
T
he
A
u
t
ho
r
s
.
T
his
wo
r
k
is
licensed
unde
r
a
C
r
ea
t
i
v
e
C
ommons
A
ttr
ibu
t
ion-Non
C
omme
r
cial-NoDe
r
i
v
a
t
i
v
es
4
.
0
License
.
V
OLU
M
E
12
,
2024
F
o
r
mo
r
e
in
f
o
r
ma
t
ion
,
see
h
tt
ps
:
//
c
r
ea
t
i
v
ecommons
.
o
r
g
/
licenses
/
b
y
-nc-nd
/
4
.
0
/
134041
C
.
Liu
e
t
al
.
:
La
r
ge
Language
M
odels
and
S
en
t
imen
t
A
nal
y
sis
in
F
inancial
M
a
r
ke
t
s
F
I
GU
R
E
1.
B
it
co
i
n
’s
a
ppro
a
ch
t
o
t
r
a
n
s
a
c
ti
on
f
l
ow
a
nd
v
a
li
d
a
ti
on
.
computational power available [9]. Armed with a wide
range and variety of training data, these models have shown
remarkable pro
f
i
ciency in mimicking human language skills,
resulting in signi
f
i
cant transformations across various
f
i
elds,
including the
f
i
nancial domain [10]. Applying LLMs to
sentiment analysis represents an innovative shift, where
the traditional sentiment analysis challenges are reinter-
preted and addressed through more advanced computational
approaches [11]. Their effectiveness is particularly notable in
tasks that require deep contextual understanding and nuanced
language interpretation, such as predicting market trends,
analyzing investor sentiments, and interpreting
f
i
nancial
news [10], [12], [13]
Despite the growing interest in using LLMs for sentiment
analysis, especially in
f
i
nancial markets, there remains a
signi
f
i
cant gap in understanding the extent and nature of their
impact on
f
i
nancial instruments, particularly cryptocurren-
cies like Bitcoin. Existing literature predominantly focuses
on the technical capabilities of LLMs without adequately
exploring their practical implications in
f
i
nancial sentiment
analysis. Our study seeks to bridge this gap by not only
categorizing various LLMs and their applications in
f
i
nancial
markets but also by empirically investigating the correlation
between news sentiment, as processed by these models, and
Bitcoin price movements. This approach aims to provide a
more nuanced understanding of the role of media sentiment in
cryptocurrency markets. To achieve this, our study will
answer the following research questions (RQ):
1) RQ1: How does the classi
f
i
cation, data collection, and
application of LLMs in sentiment analysis in
f
l
uence
their effectiveness in
f
i
nancial markets?
2) RQ2: What is the correlation between news sentiment,
asanalyzedbyLLMs,andthepriceofcryptocurrencies
like Bitcoin?
This paper makes a signi
f
i
cant contribution to the
f
i
eld of
f
i
nancial sentiment analysis by integrating the advanced
capabilities of LLMs with the dynamic realm of Bitcoin
and cryptocurrency markets. The study stands out for its
comprehensive examination of various LLMs, including
BERT, FinBERT, and ChatGPT, within the speci
f
i
c context of
f
i
nancial sentiment analyses [12], [14], [15]. This area is
particularly challenging due to the nuanced language and
investor sentiments intrinsic to market dynamics [16]. The
systematic categorization and analysis of these LLMs in
the paper illuminate their individual strengths and collective
potential in enhancing
f
i
nancial market analytics. The focus
on the unique features and applications of these LLMs in
the
f
i
nancial domain reveals new insights into their
transformative role in market trend prediction and investment
decision-making.
By identifying a modest but discernible correlation
between news sentiment and Bitcoin prices, the paper
contributes valuable empirical evidence to the understanding
of cryptocurrency market dynamics. This insight is crucial
for a range of stakeholders, including investors,
f
i
nancial
analysts, and policymakers, who navigate the complexities
134042
V
OLU
M
E
12
,
2024
C
.
Liu
e
t
al
.
:
La
r
ge
Language
M
odels
and
S
en
t
imen
t
A
nal
y
sis
in
F
inancial
M
a
r
ke
t
s
F
I
GU
R
E
2.
St
ruc
t
ure
o
f
t
h
is
p
a
per
.
of these emerging markets. Additionally, the discussion of
challenges and future directions for LLMs in sentiment
analysis highlights both the current capabilities and potential
growth areas for these models in
f
i
nancial applications.
Fig. 1. illustrates the process of Bitcoin transactions within a
peer-to-peer network, showcasing how transactions are
signed and added to the blockchain through mining, while
also highlighting various factors that can in
f
l
uence Bitcoins
market price.
This study is structured as follows: Section II presents the
existing literature closely related to sentiment analysis in
f
i
nancial markets. Section III outlines our research
methodology. Sections IV to VI are collectively focused on
addressing RQ1. Section IV presents a detailed classi
f
i
cation
of LLMs in sentiment analysis. Section V discusses the data
collection method and categorization. Section VI explores
the applications of LLMs in sentiment analysis. Section VII
provides a case study aimed at answering our RQ2, offering
V
OLU
M
E
12
,
2024
134043
C
.
Liu
e
t
al
.
:
La
r
ge
Language
M
odels
and
S
en
t
imen
t
A
nal
y
sis
in
F
inancial
M
a
r
ke
t
s
practical insights into applying these models in a real-world
scenario. Sections VIII and IX discuss the challenges that
should be overcome when employing LLMs to solve sen-
timent analysis tasks and highlight promising opportunities
and directions for future research. The conclusions of our
study are presented in section X. The overall organization of
the paper is presented in Fig. 2.
II
.
LI
TE
RA
TU
R
E
R
EV
I
EW
Among the various LLMs, BERT (Bidirectional Encoder
Representations from Transformers) [14] has set a new
precedent in natural language processing by understanding
the context of a word in a sentence more holistically. BERTs
architecture has been utilized in the
f
i
nancial sector to create
FinBERT [12], a model speci
f
i
cally
f
i
ne-tuned to grasp the
subtleties of
f
i
nancial jargon and sentiment. FinBERT [12]
excels in interpreting complex
f
i
nancial reports, earnings
calls,andmarketanalysis,providingmoreaccuratesentiment
predictions than general-purpose models [17]. Additionally,
Ploutos [18], another
f
i
nancial LLM, demonstrates superior
performance in predicting stock movements. This model
uniquely integrates textual and numerical data using a
mixture of experts architecture, enhancing its ability to
deliver precise explanations for its predictions. A further
groundbreaking LLM is ChatGPT [15], which has been
instrumental in enhancing interactive
f
i
nancial analysis.
ChatGPTs ability to engage in human-like conversations
and provide detailed, contextually relevant responses has
been utilized in customer service automation,
f
i
nancial
advisory, and real-time market analysis [19]. This models
sophisticated understanding of queries and ability to generate
coherent and context-aware responses make it an invaluable
tool in the dynamic world of
f
i
nance.
Few recent studies focus on various applications and
advancements of LLMs in
f
i
nancial sentiment analysis. In
a recent study, Sharma et al. [20] explored the use of
generative models like ChatGPT for sentiment analysis.
These models enhance sentiment analysis by augmenting
datasets with synthetic labeled data and simulating human
sentiment expression, particularly for tasks like sarcasm
detection.Keychallengesincludemaintainingthequalityand
consistency of generated data and addressing inherent biases.
By overcoming these issues, the potential for sentiment anal-
ysis in real-world applications can be signi
f
i
cantly enhanced.
The architectures and applications of large language models,
including their use in sentiment analysis presented by Raiaan et
al. [21]. They categorize different LLMs, such as GPT-3, and
explore their applications in various domains, including
f
i
nance. The paper also addresses the challenges and open
issues in deploying LLMs for sentiment analysis, such as
data scarcity and model interpretability. The increasing role
of generative AI models, such as GPT-3, in business and
f
i
nance is discussed in [22]. The work highlights the
potential of these models to generate realistic
f
i
nancial data,
perform sentiment analysis, and support decision-making
processes. The paper also explores the ethical implications
andregulatorychallengesassociatedwithusinggenerativeAI
in
f
i
nancial markets.
Dong et al. [23] investigate the application of LLMs for
extracting relevant information from
f
i
nancial documents.
The authors employ GPT-3 to analyze annual reports,
earnings call transcripts, and other
f
i
nancial texts to identify
key sentiment indicators and predict stock price movements.
The study shows that LLMs can effectively process and
interpret large volumes of text data, providing valuable
insights for investors and analysts. Farimani et al. [24]
investigate the ef
f
i
ciency and accuracy of using LLMs like
GPT-3 for sentiment analysis in the
f
i
nancial market. The
authors compare the performance of LLMs with traditional
models,demonstratingsigni
f
i
cantimprovementsincapturing
nuanced sentiments and predicting market trends based on
f
i
nancial news and social media data. Another early review
[25] emphasizes the potential of both BERT and GPT-
2inadvancing
f
i
nancialsentimentanalysisthroughimproved
feature mapping techniques, leveraging their respective
strengths in understanding context and generating relevant
text.
While previous studies have signi
f
i
cantly advanced
f
i
nan-
cial sentiment analysis using models like FinBERT and
integrated approaches combining sentiment indices with pre-
dictive models, our approach introduces a novel perspective
by leveraging more speci
f
i
c aspects like classi
f
i
cation, data
collection and application with a case study. A central
element of this research is the empirical investigation into the
correlation between news sentiment, as analyzed by LLMs,
and Bitcoin price movements. This case study is particularly
relevant given the growing in
f
l
uence of cryptocurrencies
like Bitcoin in global
f
i
nancial markets. Bitcoin serves as a
benchmark for the digital currency landscape, characterized
by its volatility, decentralized nature, and sensitivity to public
sentiment and news [26]. The study addresses the pressing
need to understand Bitcoins market behavior due to its
escalating impact on retail and institutional investors and its
potential in reshaping
f
i
nancial technology and monetary
transactions [27].
III
.
R
E
S
E
AR
C
H
MET
H
OD
This literature review adheres to the methodology proposed
by Kitchenham et al. [28], [29]. Following the guidelines
provided by Kitchenham et al. [28], our methods included
two main steps: planning and conducting the review.
Established academic databases were utilized to gather the
relevant literature, including Web of Science, IEEE Xplore,
Springer, arXiv, and UoA(University of Auckland) Library.
The following sections describe the methodology used to
source and evaluate the chosen literature. Speci
f
i
cally, Fig. 3
presents the structure of the literature review.
Our manual search encompasses four critical databases
knownfortheircomprehensivecollectionofscienti
f
i
cpapers.
The methodology involved a multi-step process, beginning
with creating a keyword dictionary instrumental in the initial
search across these databases. Our search string should
134044
V
OLU
M
E
12
,
2024
C
.
Liu
e
t
al
.
:
La
r
ge
Language
M
odels
and
S
en
t
imen
t
A
nal
y
sis
in
F
inancial
M
a
r
ke
t
s
F
I
GU
R
E
3.
LL
M
s
c
l
a
ssi
f
i
c
a
ti
on
a
nd
lit
er
a
t
ure
rev
i
ew
me
t
hodo
l
ogy
i
n
f
i
n
a
nc
i
a
l s
en
ti
men
t
a
n
a
l
y
sis.
combine two sets of keywords: one related to sentiment
analysis and the other to LLMs. If the paper contains both
types of keywords, it is more likely that it is the paper we
need. The complete set of search keywords is as follows
:
1) Keywords related to sentiment analysis: Sentiment
detection, Opinion mining, Emotional analytics, Affec-
tive computing, Polarity classification, Subjectivity
analysis, Sentiment scoring, Mood analysis, Opinion
polarity, Sentiment quantification, Emotion recogni-
tion, Tone analysis, Sentiment lexicons, Sentiment
metrics, Textual affect detection, Semantic orien-
tation, Sentiment strength, Sentiment benchmarks,
Sentimental analysis tools, Review analysis, Con-
sumersentiment,Investorsentiment,Marketsentiment,
Brand sentiment, Social sentiment, Sentiment cor-
relation, Aspect-based sentiment analysis, Sentiment
summarization,Sentimentalclassification,Sentimental
interpretation.
2) Keywords related to LLMs: LLM, Language Model,
Large Language Model, Pre-trained, PLM, Pre-
training,NLP,NaturalLanguageProcessing,DL,Deep
Learning, ML, Machine Learning, ChatGPT, Neural
Network, Transfer Learning, Sequence Model, T5,
GPT, Codex, BERT, Transformer, Attention Model, AI,
Artificial Intelligence.
We included keywords like Machine Learning and Deep
Learning, alongside other terms not directly related to Large
Language Models (LLMs), in our search criteria. This
broader approach aims to ensure we don not overlook any
relevant research, thereby expanding our search scope during
automated searches.
Upon retrieving these papers, the next step involved
a detailed examination of the titles and abstracts, which
allowed to determine the relevance of each paper to
the research objectives based on inclusion and exclu-
sion criteria. We designed these criteria following several
several state-of-the-art papers [30], [31], as shown in
Table 1, so that the selected documents can directly address
our topic. We drop duplicated studies across multiple
databases to re
f
i
ne our dataset, streamlining our literature
collection.
T
A
BL
E
1.
I
nc
l
u
si
on
cr
it
er
i
a
a
nd
e
x
c
l
u
si
on
cr
it
er
i
a
.
Following the curation of unique studies, we proceeded to
a more in-depth review, scanning the full text of each
selected paper. A thorough quality assessment can help
mitigate biases that may arise from low-quality studies
and guide readers on where to approach conclusions with
caution [32]. We developed a set of ten Quality Assessment
Criteria (QAC), detailed in Table 2. These criteria evaluate
the papers’ relevance, clarity, validity, and importance.
The
f
i
nal stage in our search process was to conduct a
quality assessment of these primary studies, evaluating
them against prede
f
i
ned criteria to ensure that only the
most rigorous and relevant research was included in our
analysis.
The systematic literature review on LLMs for sentiment
analysis acknowledges the risk of missing key studies due to
potential gaps in keyword summarization. To mitigate this,
a dual approach combining manual review and auto-mated
searches was utilized, with keywords derived from
authoritative sources and forward and backward snowballing
techniques employed to ensure thoroughness. Additionally, to
counter study selection bias, de
f
i
ned inclusion and
exclusion criteria were established, and a QAC framework
was implemented, with ambiguous cases receiving manual
scrutiny. This blend of strategies aimed to balance ef
f
i
ciency
with meticulousness, reducing biases and enhancing the
reviews validity.
V
OLU
M
E
12
,
2024
134045
C
.
Liu
e
t
al
.
:
La
r
ge
Language
M
odels
and
S
en
t
imen
t
A
nal
y
sis
in
F
inancial
M
a
r
ke
t
s
T
A
BL
E
2.
Chec
klist
o
f
qu
a
lit
y
a
ss
e
ss
men
t
cr
it
er
i
a
(
QAC
)
f
or
st
ud
i
e
s
on
LL
M
s i
n
s
en
ti
men
t
a
n
a
l
y
sis.
I
V.
C
L
A
SSI
F
I
C
A
T
I
ON
OF
L
AR
GE
L
A
NGU
A
GE
MODE
LS I
N
S
ENT
I
MENT
A
N
A
LYSIS
This section explores the classi
f
i
cation of LLMs in the
contextofsentimentanalysis,emphasizinghowtheirsizeand
architecture impact their effectiveness. We categorize LLMs
based on their structural design, distinguishing between
encoder-only, encoder-decoder, and decoder-only models,
each with distinct capabilities in processing natural language.
For an in-depth exploration of the relevant literature, we have
included a comprehensive summary in Table 4.
A
. L
ARGE
L
ANGUAGE
MODE
LS
Pre-trained language models (PLMs) have proven highly
effective in various natural language processing (NLP) tasks,
as evidenced in several studies [33], [34]. Researchers have
noted that increasing the size of these models signi
f
i
cantly
boosts their capabilities, particularly when the size of
parametersexceedsacertainpoint[35],[36].Thedesignation
‘Large Language Model (LLM) is used to differentiate
language models based on their size, primarily referring to
PLMs with a larger scale [37]. However, it is important to
mention that there is no widely agreed-upon standard in the
literature for the minimum parameter size for an LLM, as
its ef
f
i
ciency is linked to the datasets size and the total
computing power used. In our study, we follow the
classi
f
i
cationandtaxonomyofLLMsintroducedbyPan et al.
[38], dividing mainstream LLMs into three categories based
on their architecture: encoder-only, encoder-decoder, and
decoder-only. This classi
f
i
cation and the corresponding
models are depicted in Fig. 3.
1
)
EN
C
ODER-ONLY
LL
M
s
Encoder-only LLMs are a speci
f
i
c type of neural network
framework that employs solely the encoder part of the model.
The primary role of the encoder is to process and transform
theinputtextintoahiddenrepresentation.Thisrepresentation
is critical in understanding the connections among words
and the general context of the sentence. Prominent examples
of encoder-only LLMs include BERT [14] and various
adaptations of it [12], [39], [40]. BERT, in particular, is built
on the encoder architecture of the Transformer [41]. Its
unique feature is the bidirectional attention mechanism,
which allows it to analyze the context to the left and
right of each word concurrently during its training phase. In
the
f
i
nancial domain, other prominent models like
FinBERT[12],CryptoBERT[42],andSBERT[26]havebeen
widely employed.
These models distinguish themselves from the original
BERT [14] by enhancing the architecture to include novel
pre-training tasks or adjusting to different data modalities,
thereby improving their effectiveness for
f
i
nance-related
tasks. For instance, FinBERT [12] is an adaptation of
BERT [14] that is pre-trained explicitly on
f
i
nancial corpora
and
f
i
ne-tuned to perform sentiment analysis within the
f
i
nancial domain, achieving an accuracy of 0.86 and an F1-
Score of 0.84. Similarly, CryptoBERT [42], which is also
grounded in the BERT [14] model, undergoes
f
i
ne-tuning on
a cryptocurrency-speci
f
i
c corpus, yielding heightened
accuracy in the sentiment classi
f
i
cation of texts related to
cryptocurrencies. It achieved accuracy scores of 55.60 and an
F1-Score of 55.79 among
f
i
ve models for the StockTwits
1
data,
which contains 1.875 million posts. These models have
demonstrated their pro
f
i
ciency in various applications, such
as predicting market movements, analyzing investor
sentiments, and automating
f
i
nancial report summaries,
showcasing their transformative impact on the
f
i
nancial
analytics landscape.
2
)
EN
C
ODER-DE
C
ODER
LL
M
s
Encoder-decoder LLMs integrate both the encoder and
decoder components [41]. The encoder component converts
the input text into a hidden representation, adeptly grasping
the fundamental structure and meaning. This con
f
i
dential
representation is a transitional language, facilitating the
connection between input and output formats. On the other
hand, the decoder leverages this hidden representation to
produce the desired output text, transforming the abstract
representationintospeci
f
i
c,contextuallyappropriatephrases.
Within this context, the memory module of models like
FINMEM [43] stands out. It mirrors human cognitive
processes, providing clear interpretability and
f
l
exibility for
real-time adjustments. This feature enhances the models
utility in
f
i
nancial trading by allowing it to hold on to
essential information for extended periods, which is crucial
for complex decision-making. FINMEM outperformed in
trading
f
i
ve different stocks, achieving the highest Sharpe
ratio of 2.6789 and lowest max drawdown of 10.7996%.
Another example is TradingGPT [44], an innovative LLM
multi-agent framework endowed with layered memories. The
ability of TradingGPT to navigate through
f
i
nancial data and
1
https://stocktwits.com/
134046
V
OLU
M
E
12
,
2024
C
.
Liu
e
t
al
.
:
La
r
ge
Language
M
odels
and
S
en
t
imen
t
A
nal
y
sis
in
F
inancial
M
a
r
ke
t
s
its application in trading exempli
f
i
es how encoder-decoder
LLMscanbepotenttoolsinenhancingtradingstrategies[44].
3
)
DE
C
ODER-ONLY
LL
M
s
Decoder-only LLMs exclusively use the decoder module to
produce the intended output text. They follow a unique
training approach focusing on sequential prediction [45].
Contrary to the encoder-decoder framework, where the
encoder handles the input text, the decoder-only structure
starts from a base state and sequentially predicts tokens,
thereby progressively constructing the output text. This
method heavily depends on the models pro
f
i
ciency in
grasping and predicting language structure, syntax, and
context. Key examples of this architecture include the GPT
series models such as GPT-1, GPT-2, GPT-3, GPT-4, and
theirsigni
f
i
cantvariant,ChatGPT.
2
[45],[46],[47],[48].The
GPT series has shown promising performance in
f
i
nancial
sentimentanalysis,notonlyforTwitternewsbutalsointerms
of accuracy, recall, and F1-score across different forex pair
news [49], [50]. These models demonstrate their capability to
excel in
f
i
nancial contexts, highlighting their potential for
improving sentiment analysis and market prediction tasks.
These models can execute downstream tasks with min-
imal input, often requiring just a handful of examples or
straightforward instructions. This attribute eliminates the
need for additional prediction heads or extensive
f
i
ne-tuning
processes, rendering them particularly valuable in sentiment
analysis research. For instance, recent developments in the
industry have witnessed Google unveiling Bard. At the
same time, Meta has introduced its models, LLaMA [51]
and LLaMA2 [52], alongside Microsofts foray with Bing
Chat.
3
One application of LLaMA in the realm of
f
i
nancial
sentiment analysis is demonstrated by FinMA, a version of
LLaMA speci
f
i
cally
f
i
ne-tuned for this task, which recorded
the highest F1-score of 0.87 on the FiQA dataset [53].
Furthermore, LLaMA2 has proven effective, reaching an
accuracyof84.03%throughsupervisedlearningandaligning
f
i
nancial texts [54]. These developments highlight the capa-
bilities of LLaMA models in sentiment analysis, particularly
their pro
f
i
ciency in the precise interpretation and assessment
of
f
i
nancial sentiments.
V.
D
A
T
A
A
CQU
ISI
T
I
ON
A
ND
C
L
A
SSI
F
I
C
A
T
I
ON
FO
R
LL
M
s
I
N
S
ENT
I
MENT
A
N
A
LYSIS
This section examines the methodologies employed in
collecting and utilizing datasets for sentiment analysis in
LLMs. This section underscores the pivotal role of data in
training LLMs, emphasizing the need for diversity and
comprehensiveness in dataset collection to enhance model
performance in varied contexts [55]. We explore the
systematic process of dataset categorization, preprocessing,
and formatting, which is essential for aligning data with the
models training objectives and processing needs.
2
https://chat.openai.com/
3
https://www.microsoft.com/en-us/edge/features/bing-chat
A
. S
OUR
C
I
NG
DATA
S
ET
S
FOR
TRA
I
N
I
NG
L
ARGE
L
ANGUAGE
MODE
LS
Data is a vital and essential component in training Large
Language Models (LLMs), signi
f
i
cantly in
f
l
uencing their
generalization capabilities, ef
f
i
ciency, and overall perfor-
mance [55]. An ample amount of high-quality and varied
dataenablesmodelstothoroughlylearnfeaturesandpatterns,
f
i
ne-tunetheirparameters,andmaintaindependabilityduring
validation and testing.
Our initial focus is on examining the methodologies for
dataset acquisition. Through this analysis of data
collection techniques, we have categorized the sources of
data into four groups: open-source datasets, datasets that
are actively collected, datasets that are speci
f
i
cally
constructed, and datasets derived from industrial sources.
Open-source datasets [56], [57] are publicly available data
compilations typically distributed via open-source platforms
or repositories. An example of this is the FiQA [56] dataset, a
substantial new dataset featuring Question-Answering pairs
focusedon
f
i
nancialreportscraftedbyexpertsin
f
i
nance.The
credibility of these datasets is bolstered by their open-source
status, enabling community-based updates and ensuring their
reliability for scholarly research.
The Financial PhraseBank,
f
i
rst introduced by Malo et al.
[58], consists of 4,845 English sentences randomly selected
from
f
i
nancial news articles in the LexisNexis database.
These sentences were annotated by 16 experts in
f
i
nance and
business who evaluated how the information could in
f
l
uence
the stock prices of the companies discussed. Furthermore,
the dataset includes information about the level of agreement
among the annotators regarding the sentiments expressed in
the sentences.
TRC2-
f
i
nancial is a specialized subset of the TRC24
4
collection from Reuters, which encompasses 1.8 million
news articles released between 2008 and 2010. This subset
speci
f
i
cally contains 46,143 documents, totaling nearly 29
million words and close to 400,000 sentences [12].
SemEval 2017 Task 5 focuses on
f
i
ne-grained sentiment
analysis (FSA) of news headlines and microblogs [59].
The training set for this task includes 1,142
f
i
nancial news
headlines and 1,694 microblog posts, each annotated with
target entities and their corresponding sentiment scores.
The test set comprises 491
f
i
nancial news headlines and
794 posts [11].
Collected datasets [26], [60] are compiled by researchers
from diverse sources, such as signi
f
i
cant websites, forums,
blogs, and social media. Researchers often extract data from
sources like Twitter and Reddit for datasets speci
f
i
cally
tailored to their research inquiries.
Constructed datasets [12] are researcher-generated
datasets derived from modifying or enhancing collected
datasets to align with speci
f
i
c research goals. Manual or
semi-automatic modi
f
i
cations can include creating domain-
speci
f
i
c tests, annotated datasets, or synthetic data.
4
https://trec.nist.gov/data/reuters/reuters.html
V
OLU
M
E
12
,
2024
134047
C
.
Liu
e
t
al
.
:
La
r
ge
Language
M
odels
and
S
en
t
imen
t
A
nal
y
sis
in
F
inancial
M
a
r
ke
t
s
T
A
BL
E
3.
D
a
t
a
t
ype
s
o
f
d
a
t
a
s
e
ts i
nvo
l
ved
i
n
pr
i
or
st
ud
i
e
s.
Industrial datasets [10] sourced from commercial or
industrial entities contain proprietary data and are essential
for research addressing real-world business contexts.
Acknowledging that certain studies utilize diverse datasets
encompassing various categories is essential. For instance,
Wuetal.[10]trainedBloombergGPTusingmultipledatasets,
e.g., complex table datasets and question-answering pairs.
B.
VAR
I
ET
Y
OF
DATA
S
ET
S I
N
E
X
IS
T
I
NG
LL
M
s
FOR
S
ENT
I
MENT
ANA
LYSIS S
TUD
I
E
S
The data types are crucial in determining the architecture
and choice of LLMs, as they directly affect the extraction of
implicit features and the decisions made by the model [67].
The selection of speci
f
i
c data types can signi
f
i
cantly
in
f
l
uence the LLMs’ overall effectiveness and ability to
generalize. In our research, we explore and categorize the
various types of
f
i
nancial datasets used in studies of LLMs
for sentiment analysis. By examining how data types relate to
model architectures and their performance, we aim to
highlight the importance of data types in the effectiveness of
LLMs for sentiment analysis.
We classi
f
i
ed the data types of all datasets into
f
i
ve
categories: Twitter posts, Reddit posts, News articles, Annual
reports, and Fi-QA. Table 3 describes the speci
f
i
c data
included in the data types corresponding to the datasets we
summarized from the 15 studies.
V
I
.
A
PPLI
C
A
T
I
ON
S
OF
LL
M
s I
N
F
I
N
A
NC
I
A
L S
ENT
I
MENT
A
N
A
LYSIS
This section delves into LLMs’ diverse and transformative
applications in
f
i
nancial sentiment analysis. In recent years,
integrating advanced LLMs into the
f
i
nancial sector has
134048
V
OLU
M
E
12
,
2024
C
.
Liu
e
t
al
.
:
La
r
ge
Language
M
odels
and
S
en
t
imen
t
A
nal
y
sis
in
F
inancial
M
a
r
ke
t
s
marked a signi
f
i
cant evolution in how
f
i
nancial data, market
trends, and investor sentiments are analyzed and interpreted.
This section explores how LLMs predict market trends,
optimize trading strategies, and forecast stock prices.
A
. P
RED
I
C
T
I
VE
ANA
LY
T
I
C
S I
N
C
R
YP
TO
C
URREN
C
Y
MAR
K
ET
S
U
SI
NG
LL
M
s
This section explores the application of LLMs for predicting
cryptocurrency market trends, with a particular focus on inte-
grating sentiment analysis into these predictions. The poten-
tial of LLMs to distill sentiment from vast datasets offers a
novel dimension to the forecasting models, as evidenced by
several recent studies. Zou and Herremans [61] intro-duced
a pioneering multimodal model, PreBit, speci
f
i
cally designed
to anticipate signi
f
i
cant Bitcoin price movements.
Bashchenko [26] provided insights that counter the notion of
Bitcoins value being purely speculative, demonstrating that
non-endogenous news carries fundamental information
affecting Bitcoin prices.
Raheman et al. [62] highlighted the practical advantages of
interpretable AI and NLP methods over non-explainable
alternatives, suggesting that transparency in AI could lead to
more valuable applications in the
f
i
nancial sector. Ider and
Lessmann [68] demonstrated the advantages of re
f
i
ning
FinBERT with weakly labeled data, illustrating how even
imprecisely labeled datasets can signi
f
i
cantly improve text-
based feature prediction and forecasting accuracy for cryp-
tocurrency returns. Their study utilized a dataset comprising
433 test samples, with a noteworthy agreement rate of 92.6%
among all 16 expert labels. This approach facilitated the
development of predictive models for Bitcoin and Ethereum
that substantially outperformed baseline models, achieving
gains of 0.572 and 0.501, respectively.This evidence under-
scores the ef
f
i
cacy of leveraging weak labels in enhancing
the performance of
f
i
nancial prediction models, particularly in
the volatile domain of cryptocurrency markets.
Ortu et al. [63] investigated cryptocurrency price pre-
diction by analyzing social sentiment data from GitHub
and Reddit, employing a pre-trained BERT-based model to
synthesize emotional and sentiment indicators from social
media commentary into hourly and daily series datasets.
Their
f
i
ndings indicated that incorporating these social
sentiment metrics markedly enhances the predictive accuracy
for the daily pricing of Bitcoin and Ethereum. The research
highlights a signi
f
i
cant inverse relationship between negative
sentiment and price volatility within the Bitcoin market, sug-
gesting that users might interpret volatility as a speculative
opportunity. In contrast, the Ethereum market sentiment is
predominantlyin
f
l
uencedbyemotionalarousal,whichshows
a substantial positive correlation with negative sentiment,
indicating that community reactions are more emotionally
driven rather than directly related to price movements.
Building on these
f
i
ndings, Nguyen et al. [27] explored the
distinctive impact of ChatGPT-based sentiment indicators on
Bitcoinreturns,revealingitsadeptnessatsentimentdetection.
The studys results prompt further investigation into how
Generative AI might enhance
f
i
nancial data analysis and
social media sentiment interpretation, potentially unlocking
more sophisticated market insights. This research opens up
new pathways for sentiment analysis in
f
i
nancial markets,
leveraging AI technologies.
B. S
ENT
I
MENT
-
DR
I
VEN
LL
M
S
TRATEG
I
E
S
FOR
F
I
NAN
C
I
A
L
TRAD
I
NG
Developing a robust trading strategy is crucial in the volatile
realm of
f
i
nancial markets, where integrating sentiment
analysisandLLMscanprovideacompetitiveedge.Kim et al.
[60] leveraged an LLM adapted to the crypto domain to
parse crypto news sentiments called CBITS. Their
research demonstrates that trading strategies augmented
with sentiment scores signi
f
i
cantly outperform conventional
models, underscoring the ef
f
i
cacy of sentiment-based trading
approaches. Backtesting various Bitcoin trading strategies,
their study reveals that models employing TabNet combined
with RoBERTa, speci
f
i
cally the TabNet RoBERTa top 10,
yield the highest pro
f
i
t, recording an impressive gain of
304.65%. In contrast, other models assessed during the same
test period generated negative returns.
Yu et al. [43] introduced FINMEM, an innovative LLM-
based framework crafted for
f
i
nancial decision-making.
This framework is structured around three central modules:
Pro
f
i
ling, which tailors the agent to speci
f
i
c investor pro
f
i
les;
Memory, which processes
f
i
nancial information in a layered
mannerakintohumancognitivestructures,facilitatingdeeper
assimilation of
f
i
nancial data; and Decision-making, which
translates the processed information into actionable invest-
ment strategies. The adaptability of FINMEM, particularly
its memory module, provides a level of interpretability that
mirrors human trading logic, coupled with the capability for
real-time adjustment to optimize trading decisions.
Li et al. [44] took the concept further by developing an
LLM multi-agent framework with layered memories called
TradingGPT. The LLMs at the heart of this framework act as
decision-making cores for trading agents, utilizing the
layered memory system to synthesize historical data and
current market conditions. This innovative approach enables
the agents to engage in strategic dialogues with peers, re
f
i
ne
their investment choices, and uphold a diverse yet robust
decision-making process informed by their unique trading
personas.
Curtó et al. [69] provided empirical evidence showcasing
the adaptability of LLM-informed strategies to the dynamic
bandit problem, a standard paradigm in trading strategy
formulation. Their experiments underscore the ability of
LLMs to navigate the complexities of the
f
i
nancial markets,
yielding a strategy that competes favorably with traditional
methods even in unpredictable scenarios.
Gupta [65] aimed to streamline the analysis of Annual
Reports across various
f
i
rms by harnessing the analytical
prowess of LLMs. A machine learning model was trained
V
OLU
M
E
12
,
2024
134049
C
.
Liu
e
t
al
.
:
La
r
ge
Language
M
odels
and
S
en
t
imen
t
A
nal
y
sis
in
F
inancial
M
a
r
ke
t
s
using these insights as predictive features by distilling
insights from the LLMs into a quantitatively styled dataset
and supplementing it with historical stock prices. The
walk-forward testing indicated that such a model could sig-
ni
f
i
cantly outperform benchmarks like the S&P 500 returns,
underscoring the potential of GPT3.5 to revolutionize trading
strategies.Theresearchrevealedthatthemodel,whenusedto
select the top k stocks, consistently generated higher returns
than the S&P 500. Notably, the returns were inversely related to
the value of k, with lower k values correlating with higher
returns. This outcome indicates that the stocks predicted as
top performers by the GPT model indeed yielded better
f
i
nancial results.
C
.
EN
H
AN
C
I
NG
S
TO
C
K
MAR
K
ET
FORE
C
A
S
T
I
NG
W
I
T
H
LL
M
S
Our analysis underscores the broad utility of LLMs in stock
price prediction through sentiment analysis, showcasing
their versatility across various
f
i
nancial applications. Araci
introduced FinBERT, a model tailored for the
f
i
nancial
sector, demonstrating superior capabilities in economic text
miningandsuggestingfurtherapplicationofFinBERTacross
different
f
i
nancial NLP tasks. FinBERTs utility could be
signi
f
i
cantly extended by integrating more extensive stock
market datasets, presenting opportunities for more intricate
market analysis and model re
f
i
nement.
Mishev et al. [11] provided evidence that contextual
embeddings substantially improve ef
f
i
ciency for sentiment
analysis over traditional lexicons and static word encoders, a
bene
f
i
t that holds even in the absence of large datasets. This
advancement points to the potential of LLMs to revolutionize
sentiment analysis with a more profound understanding of
contextual nuances in
f
i
nancial texts.
Deng et al. [66] revealed that LLMs can achieve remark-
able outcomes in market sentiment analysis. The study
showed that with minimal examples, it is possible to calibrate a
‘student’ model that matches or surpasses the performance of
more extensive, state-of-the-art models, optimizing both
effectiveness and computational ef
f
i
ciency.
Fazlija and Harder [64] identi
f
i
ed that sentiment scores
derived from news content play a critical role in predicting
the direction of stock prices. The correlation between
news sentiment and market performance underscores the
value of high-quality, content-based sentiment indicators in
forecasting models.
V
II
.
C
A
S
E
S
TUD
Y
R
EG
AR
D
I
NG
T
H
E
CO
RR
E
L
A
T
I
ON
B
ETWEEN
NEW
S S
ENT
I
MENT
A
ND
BI
TCO
I
N
P
R
I
CE
This case study aims to explore the relationship between the
sentiment expressed in cryptocurrency news articles and the
price
f
l
uctuations of Bitcoin. Leveraging the power of
sentiment analysis through advanced language models, this
study seeks to provide a deeper understanding of how public
sentiment, as re
f
l
ected in media [16], can impact
f
i
nancial
markets, particularly the volatile cryptocurrency sector.
F
I
GU
R
E
4.
D
a
t
a
s
e
t
cre
a
ti
on
proce
ss.
A
.
DATA
C
O
LL
E
C
T
I
ON
AND
ANA
LYSIS
MET
H
OD
1
)
C
RYP
T
O
C
URREN
C
IE
S
D
ATA
We collected comprehensive daily cryptocurrencies data
from the investing website, www.investing.com,
5
to inves-
tigate this relationship. The dataset spans two years, from
November 1, 2021, to November 1, 2023, and encompasses
various metrics, including price, closing price, highest and
lowest price of the day, opening price, and volume of
transactions. Consistent with methodologies employed in
similar studies [70], the price was chosen as the primary
targetvariable.ThisdecisionisbasedontheBashchenko[26]
prices everyday use as a critical indicator of market
sentiment in
f
i
nancial research, providing a reliable measure
of the markets end-of-day valuation for Bitcoin.
Given the current existence of approximately 1,000
cryptocurrency coins, some of which suffer from incomplete
information or delayed publication, our selection criteria
focused on coins with at least 1,000 recorded observations.
This threshold ensures the accumulation of suf
f
i
cient data for
our analyses. Importantly, the use of transfer entropy as a
methodological approach in our study offers the advantage of
not necessitating a balanced dataset, thus allowing for a
broader inclusion of data points. Our dataset represents over
80% of the cryptocurrency markets total market
capitalization, ensuring a comprehensive analysis scope.
2
)
NE
W
S
D
ATA
To gather cryptocurrency-related news data, we employed an
open-source Python library, scrape,
6
renowned for its
ef
f
i
ciency in web scraping. This tool was instrumental in
compiling a substantial dataset of tweets about various
cryptocurrencies. To ensure a targeted and relevant data
collection, we used a set of carefully selected search
keywords for each cryptocurrency. For instance, in the case of
Bitcoin, the search parameters included a combination of its
name and symbol, such as ‘BTC OR BTC OR BITCOIN OR
Bitcoin’. Aligning with the timeframe of our price data, the
collection period for the news data was also set from
November 1, 2021, to November 1, 2023.
Given the sheer volume of cryptocurrency-related news
and the constraints of our computational resources, it was
necessary to limit the quantity of news collected daily for
each cryptocurrency. Without such a limit, the sentiment
analysis process would have been impractically prolonged,
5
https://www.investing.com/crypto/bitcoin/btc-usd
6
https://github.com/JustAnotherArchivist/snscrape
134050
V
OLU
M
E
12
,
2024
C
.
Liu
e
t
al
.
:
La
r
ge
Language
M
odels
and
S
en
t
imen
t
A
nal
y
sis
in
F
inancial
M
a
r
ke
t
s
T
A
BL
E
4.
S
umm
a
ry
o
f
lit
er
a
t
ure
rev
i
ew
.
V
OLU
M
E
12
,
2024
134051
C
.
Liu
e
t
al
.
:
La
r
ge
Language
M
odels
and
S
en
t
imen
t
A
nal
y
sis
in
F
inancial
M
a
r
ke
t
s
potentially taking months. To achieve this, we implemented a
timed request mechanism, where news was requested every
20 seconds using a single computer. This approach was
crucial to avoid triggering anti-crawler mechanisms on the
websites we scraped, while also ensuring a consistent data
collection rate. By limiting requests in this manner, we
capped the collection at a maximum of 5000 news articles per
day for each cryptocurrency, totaling 18506 articles for the
entire study period. For each piece of news, we
meticulously recorded several key attributes: the date and time
of the post (DateTime), the headline, the main text of
thenewsarticle,theauthorsinformation,theURL,andafew
other relevant features.
3
)
S
EN
T
I
M
EN
T
C
L
A
SS
I
F
IER
S
FinBERT, introduced by Araci in 2019 [12], stands as the
f
i
rst
f
i
nance domain-speci
f
i
c BERT model, pretrained on
the expansive TRC2-
f
i
nancial corpus. This corpus, a spe-
cialized subset of Reuters’ TRC2, comprises approximately
1.8 million news articles published between 2008 and 2010.
Given the scope of this paper, a detailed exploration of the
BERT architecture is beyond our purview, but readers are
encouraged to consult Aracis original work for a
comprehensive understanding.
The FinBERT model underwent further
f
i
ne-tuning using
the Financial Phrase Bank, a resource developed by Malo et
al. in 2013 [71], speci
f
i
cally for sentiment classi
f
i
cation tasks
within the
f
i
nancial domain. FinBERTs performance in
f
i
nancial sentiment analysis tasks showed a notable 15%
improvement over generic BERT models [12]. This
enhancement in accuracy and the successful application of
FinBERT in studies parallel to ours, such as those by Zou
and Herremans [61] and Farimani et al. [72], underscored its
suitability for our research objectives.
For our study, we opted to employ the FinBERT model in
its pre-
f
i
ne-tuned state, initially con
f
i
gured by Araci [12]. This
decision was driven by the nature of our data set, which
primarily consists of unlabeled news articles. Further
f
i
ne-
tuning of FinBERT on other labeled datasets was deemed
unnecessary, considering it has already been optimized for
sentiment classi
f
i
cation using the Financial Phrase Bank. By
applying this ready-to-use,
f
i
nely tuned FinBERT model to
our news data, we aimed to leverage its advanced
capabilities for accurate sentiment analysis in the
f
i
nancial
sector without additional training. FinBERT outputs
sentiment scores for each news article on a scale from 1 to
10, where 10 indicates a high con
f
i
dence level in
thenewspositivelyimpactingBitcoinprices.Finally,thedata
are stored in a database (Fig. 4).
4
)
ENGLE
S
BI
VA
RI
AT
E
D
CC
-G
A
R
C
H
T
E
C
HNIQUE
IN
F
IN
A
N
C
I
A
L
RE
T
URN
A
N
A
LY
S
I
S
EnglesbivariateDynamicConditionalCorrelation-Extended
Generalized Autoregressive Conditional Heteroskedastic-
ity (DCC-GARCH) technique, introduced in 2002, is a
cornerstone model for analyzing the co-movement of
f
i
nan-
cial returns. This technique boasts two signi
f
i
cant advantages
over other variants in the GARCH model family, such as the
Baba-Engle-Kraft-Kroner(BEKK)andConstantConditional
Correlation (CCC) models. Firstly, it exhibits a superior
capacity to capture time-varying conditional covariance. This
is achieved with less computational complexity than the
BEKK model, making it more ef
f
i
cient and accessible for
complex analyses. Secondly, unlike the CCC model, which
assumes constant correlations over time, the DCC model
allows for variation, adding
f
l
exibility and realism to the
analysis.
The bivariate DCC models simplicity is particularly
bene
f
i
cial in many return series contexts. A key strength is
its ability to directly account for heteroscedasticity by
calculating Dynamic Conditional Correlations (DCCs) from
standardized residuals. As Chiang et al. [73] noted, this
approach ensures that the DCCs are free from biases
associated with volatility clustering, addressing concerns
highlighted by Forbes and Rigobon [74]. Additionally, the
DCC models pro
f
i
ciency in generating accurate, time-
varyingestimatesofvolatilitiesandcorrelationsisinvaluable.
It capably re
f
l
ects the latest market news and responds to
regime shifts triggered by shocks and crises. This dynamic
analysis of correlations over time facilitates more informed
asset allocation and hedging decisions.
Implementing the DCC model involves a two-step process
to ascertain conditional correlations. A univariate GARCH
model is initially estimated for each return series, yielding
the conditional variance. Subsequently, dynamic conditional
correlations are derived from these standardized residuals.
Following the methodology described by Bauwens and
Laurent, the model is delineated like this
:
X
0
.
5
R
t
= µ
t
+
Z
t
(1)
t
t
t
t
S
tt
where the return vector R
t
=
(r
S
,
r
j
)
and sector indices, r
S
and the selected alternative investments, r
j
µ
t
=
(
µ
t
,µ
j
)
is
iid
P
theconditionalmeanprocess,andZ
t
N(0
,
1)isan(2
×
1)
independent identically distributed random variables vector.
The conditional covariance matrixt
=
D
t
C
t
D
t
, with the
conditional correlation matrix
t
h
i
1
1
C
t
=
ρ
S
/
j
=
diag(Q
t
)
2
Q
t
diag(Q
t
)
2
(2)
S
j
and
q
q
D
t
=
diagh
t
,
h
t
(3)
S
j
q
q
whereh
t
andh
t
denotetheunivariateGARCHvariances,
The (2
×
2) symmetric positive matrix Q
t
is given by
&hibar;
Q
t
=
(1
α β
)N
+ αη
t
1
η
t
1
+ β
Q
t
1
(4)
&hibar;
where C is the unconditional correlation matrix of standard-
ized innovations
η
t
, The added value of the positive scales
α
134052
V
OLU
M
E
12
,
2024
C
.
Liu
e
t
al
.
:
La
r
ge
Language
M
odels
and
S
en
t
imen
t
A
nal
y
sis
in
F
inancial
M
a
r
ke
t
s
and
β
is restricted to
α + β <
1. We obtain the DCCs by
t
t
S
j
1
ρ
S
/
j
=
q
S
/
j
(5)
(q
t
,
q
t
)
2
i
5
) T
R
A
N
S
F
ER
EN
T
ROPY
Transfer entropy offers distinct advantages over tradi-
tional methods, enhancing its capability to evaluate infor-
mation
f
l
ows, as highlighted by Barnett et al. [75].
Unlike conventional econometric models that rely heavily on
domain-speci
f
i
c assumptions and constraints, transfer
entropy facilitates a non-parametric analysis of time-series
data, minimizing the need for extensive presumptions about
stochastic processes. Fundamentally, transfer entropy is
grounded in econophysics, focusing on quantifying the
directional information
f
l
ow of a variable over time, rooted
ininformationtheory.Thisconceptwasoriginallyintroduced
by Shannon in 1948.
H
I
=
X
p(i)
·
log(p(i))(6)
i
tt
In this context, i denotes a discrete random variable
characterized by its probability distribution, p(i), re
f
l
ecting
the various outcomes it may manifest. H is identi
f
i
ed as the
most effective function for facilitating this transformation,
and H
I
is known as Shannon entropy. Shannons [76]
seminal work in 1948 established the groundwork for this
methodology, focusing on the uncertainty and dynamism in
a variables processes. Subsequently, Kullback and
Leibler [77], in 1951, expanded upon this by integrating an
additional element, referred to as process J. Notably, the
concept of Transfer entropy gains complexity with the
inclusion of more variables and values, indicating a broader
and more intricate understanding of entropy.
h
I
(k)
=
X
p(i
t
+
1
,
i
(k)
)
·
log(p(i
t
+
1
|
i
(k)
))(7)
t
To elaborate further, the marginal probability distributions
p(i)
,
p(j) and the joint probability distribution p(i
,
j) are
expected to form a stationary time series. This implies that
i
(k)
=
(i
t
,
...
,
i
t
k
+
1
) represents a sequence of values over
time. Similarly, h
j
(l) is de
f
i
ned for process J in a comparable
manner. Kullback and Leibler [77], in their 1951 work,
introducedabroaderapplicationoftheMarkovprocesstothis
context.
ttt
p(i
t
+
1
|
i
(k)
)
=
p(i
t
+
1
|
i
(k)
,
j
(k)
)(8)
Transfer entropy revolves around the likelihood of one
variable obtaining information from its past and from another
variable (j
t
). This core idea behind ‘Transfer entropy’ is to
quantify the information exchange between two distinct,
random variables. Schreiber [78] elucidated this approach,
where I and J represent two separate processes. The formula
for transfer entropy from J to I is de
f
i
ned as the difference
between the information absorbed by a future instance of
process I
(t
+
1)
from the past values of both I and J, and the
X
tt
+
1
tt
+
1
(k)
t
|
i )
!
information absorbed by the same future instance solely from
the past values of I. In essence, transfer entropy seeks to
measure the net information
f
l
ow.
T
J
I
(k
,
l)
=
i
,
j
p(i
t
+
1
,
i
(k)
j
(l)
)
·
log
(i
t
(i
t
|
i
(k)
,
j
(k)
)
(9)
In this context, T
J
I
is used to assess the
f
l
ow of
information from J to I. Dimp
f
l
and Peter [79] introduced
novel methods, including the Markov block bootstrap and
the repeated bootstrap, to this
f
i
eld of study. They base their
investigation on the null hypothesis which posits the absence
of any information transfer.
=
P
tt
P
(
ttt
RT
J
I
(k
,
l)
1
i
φ
q
(i
(k)
)
·
p
q
(i
t
+
1
|
i
(k)
)
1
q
i
,
j
φ
q
(i
t
k)
,
j
(k)
)
·
p
q
(i
t
+
1
|
i
(k)
,
j
(k)
)
!
log(10)
q
( )
P
j
q
P
p
q
Here, J and I represent two distinct processes, while q is a
positive weighting parameter q
>
0 applied to the individual
probability function p(
.
) for computations. Speci
f
i
cally, i
n
refers to the n
th
element of the time series I, and j
n
denotes
then6thelementofthetimeseriesforthevariableJ.Itshould
be recognized that
φ
q
(j)
=
p
p
j
(j)
and
φ
q
constitute the
escort distribution as de
f
i
ned by
φ
q
(i)
=
p
q
(i)
. The primary
i i
purpose of introducing the Markov process into this analysis is
to estimate the likelihood of transitioning from one state to
anotherduringinformationtransfer,aswellastofacilitatethe
prediction of potential transition matrix scenarios. According
to Equation 10, and following the methodology suggested by
Bekiros et al. [80], setting l
=
k
=
1 allows for the
denoising of the dataset and enables Transfer entropy to
detectasymmetricalinteractionsbetweenpairs(X andY)and
(Y and X), thus offering valuable insights into the dynamics of
information
f
l
ow between two time series. In essence,
transfer entropy relies on the logarithmic scale of the number
of possible outcomes, determined by a given probability
distribution, to analyze information
f
l
ows.
B.
RE
S
U
L
T
S
Table 5 presents the descriptive statistics for a case study
analyzing the volatility of Bitcoin prices and news sentiment
from November 1, 2021, to November 1, 2023. This
period witnessed signi
f
i
cant
f
l
uctuations in Bitcoin prices, as
evidenced by a minimum price of $15,766 on November 21st,
2022, and a peak of $67,526 on November 8, 2021. Brown
[81] suggests that kurtosis values typically range from
10
to
+
10, and skewness values between
3 and
+
3 are
acceptable. In this context, the Bitcoin price exhibits a
skewness more signi
f
i
cant than one and a kurtosis exceeding
3, indicating a distribution with a higher peak and thicker
tails than a normal distribution, thereby implying a higher
likelihood of extreme values. Regarding news sentiment,
the skewness is 0.9311, denoting a moderate positive skew
with a longer right tail. The kurtosis of 4.6131, above 3,
categorizes the distribution as leptokurtic,
V
OLU
M
E
12
,
2024
134053
C
.
Liu
e
t
al
.
:
La
r
ge
Language
M
odels
and
S
en
t
imen
t
A
nal
y
sis
in
F
inancial
M
a
r
ke
t
s
T
A
BL
E
5.
De
s
cr
i
p
ti
ve
st
a
tisti
c
s.
suggestingthatthenewssentimentscoreshaveasharperpeak
and heavier tails compared to a normal distribution.
Table 6 details the outcomes of unit root tests conducted on
the variables utilized in this study, explicitly presenting the
Augmented Dickey-Fuller (ADF) test results for each factor.
Despite the fact that stationarity is not a prerequisite for
utilizing the transfer entropy approach, which can handle
probability density functions from a single realization as
highlighted by Wollstadt et al. [82], we nevertheless
proceeded to perform a stationarity test.
For the Bitcoin price, a
f
i
rst-order difference was applied.
The ADF test result for Bitcoin price, with a statistic of
2.4729 and a higher p-value, indicates that the time series is
non-stationary. This means that the null hypothesis of a unit
root for Bitcoin price cannot be rejected at the 5%
signi
f
i
cance level, necessitating further analysis due to its
non-stationary nature.
In contrast, the ADF test for News Sentiment yields a
statistic of
6.0595 with a p-value of 0.01. This p-value,
being below the commonly accepted signi
f
i
cance level of
0.05, strongly refutes the null hypothesis of a unit root.
Consequently, we can con
f
i
dently reject the null hypothesis,
af
f
i
rming that the news sentiment time series is stationary.
Other cryptocurrencies show mixed results: BNB has an
ADF test statistic of
2.9179, indicating non-stationarity as
the null hypothesis cannot be rejected. ETH has an ADF
test statistic of
2.4665, which is non-stationary. DOGE
showsastatisticof
3.7383,whichrejectsthenullhypothesis
at the 5% signi
f
i
cance level, indicating stationarity. TRON
has an ADF test statistic of
2.7859, indicating non-
stationarity. XRP has an ADF test statistic of
2.8493, also
indicating non-stationarity. SOL and ADA show statistics of
2.8546 and
4.3383 respectively, indicating stationarity..
These results suggest that while News Sentiment is
stationary, most of the cryptocurrency prices exhibit non-
stationary behavior, requiring second-order difference for
further analysis.
1
)
BI
T
C
OIN
V
OL
AT
ILI
T
Y
:
RE
S
UL
T
S
F
RO
M
D
CC
-G
A
R
C
H
M
ODEL
The outcome of the DCC-GARCH model is presented in
Table 7, illustrating the dynamic adjustments in conditional
correlation within a multivariate DCC models framework,
T
A
BL
E
6.
Un
it
roo
t t
e
st
re
s
u
lts.
complemented by the GARCH models volatility insights.
These statistical results reveal that the price of Bitcoin and
news sentiment generally exhibit a similar directional
movement; however, this relationship is notably weak.
The
ρ
(Rho) value of 0.1145 indicates a low long-term
correlation between Bitcoin prices and news sentiment,
suggesting that, on average, they do not move together
closely. The
α
(Alpha) value is 0.00107, which is very small,
implying that recent news events exert minimal in
f
l
uence on
the immediate volatility of Bitcoins price. This means that
newinformationorshocksfromnewshaveanegligibleshort-
term impact on Bitcoins volatility.
Conversely, the
β
(Beta) value is 0.9874, which is quite
high, indicating that past volatility trends have a substantial
and enduring impact on the volatility of Bitcoins price. This
high Beta value suggests that historical news patterns are a
signi
f
i
cant factor in the longer-term volatility of Bitcoin.
T
A
BL
E
7.
E
sti
m
a
ti
on
re
s
u
lts
f
or
t
he
DCC
-
GARC
H
mode
l.
2
)
NE
W
S
-INDU
C
ED
S
PILLO
V
ER
E
FF
E
C
T
S
IN
C
RYP
T
O
C
URREN
C
Y
MA
R
K
E
T
S
Transfer entropy values are calculated and detailed in
Table 8 and Table 9. Its important to clarify that these
values should not be confused with directional or signal
134054
V
OLU
M
E
12
,
2024
C
.
Liu
e
t
al
.
:
La
r
ge
Language
M
odels
and
S
en
t
imen
t
A
nal
y
sis
in
F
inancial
M
a
r
ke
t
s
F
I
GU
R
E
5.
Corre
l
a
ti
on
a
mong
cryp
t
ocurrenc
i
e
s
a
nd
new
s s
en
ti
men
t.
relationships typical of correlations or coef
f
i
cients. Instead,
they should be understood as transfer entropy measures
f
l
owing from the ‘Sender’ to the ‘Receiver’, indicative of
the information transfer between the two entities. Our
f
i
ndings highlight signi
f
i
cant spillover effects within the
cryptocurrency markets, as gauged by the Transfer entropy
method.
Notably, cryptocurrencies with smaller market capital-
ization tend to react more sensitively compared to their
larger counterparts. For instance, XRP (ranked 7th) and
ADA (ranked 10th) emerge as the most notable recipients.
As shown in Table 9, they are the most sensitive to changes,
receiving signals from 7 other sources, indicating their high
reactivity to market information including news sentiment.
BNB, on the other hand, sends signals to 7 other cryptocur-
rencies, showcasing its role as a signi
f
i
cant in
f
l
uencer within
the market.
Conversely, cryptocurrencies with the largest market
capitalization exhibit lower levels of information exchange.
For example, BTC sends information to 7 other cryptocur-
rencies, illustrating its central role in the market. Despite its
signi
f
i
cant in
f
l
uence, BTC receives only 4 signals, re
f
l
ecting
its relative stability and lower sensitivity to external shocks
compared to smaller cryptocurrencies.
News events that substantially affect Bitcoins valuation
often initiate a domino effect, impacting the valuations of
other cryptocurrencies. For instance, it has been observed
that news impacts the prices of BNB, ETH, and XRP. As
shown in Table 9, News Sentiment sends 4 shocks to
othercryptocurrenciesbutreceivesonly1shockfromanother
cryptocurrency, highlighting its role as a signi
f
i
cant source of
market information.
Nonetheless, Bitcoin exerts a more profound in
f
l
uence on
these cryptocurrencies. Given that Bitcoin accounts for
approximately 50% of the total market capitalization of all
cryptocurrencies, our observations align with those reported
inthestudybyZhangetal.[83].Thissigni
f
i
cantmarketshare
underscores Bitcoins extensive connectivity and in
f
l
uence
over other cryptocurrencies, including BNB, ETH, DOGE,
TRON, XRP, SOL and ADA.
C
.
C
A
S
E
S
TUD
Y LI
M
I
TAT
I
ON
S
AND
C
ON
SI
DERAT
I
ON
S
The
f
i
ndings of this case study reveal a discernible but
modest correlation between news sentiment and Bitcoin
price
f
l
uctuations. Utilizing the robust FinBERT [12] model
for sentiment analysis and the DCC-GARCH technique for
f
i
nancial analysis, we gleaned signi
f
i
cant insights into the
dynamic interplay between public sentiment, as re
f
l
ected in
media, and Bitcoins price volatility. Speci
f
i
cally, the statis-
tical results from the DCC-GARCH model suggested that
historical news patterns wield a more substantial impact on
Bitcoins longer-term volatility than immediate news events.
These
f
i
ndings provide insights into the interrelationships
betweennewsandBitcoinprice,underscoringtheimportance
of monitoring news for cryptocurrency.
Thisinvestigationissubjecttocertainconstraints.Notably,
the scope of the data utilized could be more extensive
regarding the timeframe and the range of currencies
examined, which may in
f
l
uence the perceived relationship
between the variables. Including more data and additional
cryptocurrencies in future analyses could alter the outcomes of
this study. Furthermore, the diversity of methodologies
employed in similar studies poses challenges in directly
comparing their results. Exploring factors in
f
l
uencing cryp-
tocurrency development is an evolving area of academic
interest that warrants further exploration. Future studies aim to
overcome these limitations by incorporating broader and
more varied datasets and adopting more uniform research
methods, contributing to a more cohesive understanding
among scholars in the
f
i
eld.
V
III
.
C
HA
LL
ENGE
S
This section delves into the multifaceted challenges and
limitations of using LLMs in sentiment analysis. The techni-
cal dif
f
i
culties are paramount, highlighted by the signi
f
i
cant
computational and storage demands of evolving models like
GPT-1 [46] to those with trillions of parameters, raising
concerns about accessibility in resource-limited contexts.
LLMsalsoneedhelpwithgeneralizability,oftenneedinghelp
maintaining consistent performance across diverse domains
and tasks. This points to a need for models that are more
adaptable and versatile. Additionally, the interpretability and
ethical usage of LLMs are crucial, especially in critical
sectors like
f
i
nance, where the opaque nature of these models
can hinder trust and reliability.
A
.
CH
A
LL
ENGE
S I
N
LL
M
A
PPLI
C
A
BILI
T
Y
The evolution of LLMs has been characterized by a
substantial increase in their size, with a progression from
GPT-1s 117 million parameters [46] to GPT-2s 1.5 billion
[45] and a dramatic leap to GPT-3s 175 billion parame-ters
[47]. More recent models have continued this trend,
V
OLU
M
E
12
,
2024
134055
C
.
Liu
e
t
al
.
:
La
r
ge
Language
M
odels
and
S
en
t
imen
t
A
nal
y
sis
in
F
inancial
M
a
r
ke
t
s
T
A
BL
E
8.
Tr
a
n
s
f
er
en
t
ropy
m
a
t
r
i
x
.
T
A
BL
E
9.
S
umm
a
ry
o
f
s
end
i
ng
a
nd
rece
i
v
i
ng
si
gn
a
ls.
reaching into the trillions of parameters [84]. Such vast
sizes present formidable challenges regarding storage, mem-
ory, and computational requirements. These challenges are
particularly acute in scenarios with limited resources or real-
time demands, especially when developers cannot access
high-powered GPUs or TPUs. For instance, FinBERT is a
pre-trained model with 110 million parameters, resulting in
a considerable size of 438 MB [12]. The Hugging Face
team [85] notes that training a 176 billion parameter model
like BLOOM [86] on a 1.5 TB dataset consumes 1,082,880
GPU hours. Similarly, training the GPT-NeoX-20B model
[87] on the Pile dataset [88], which includes over 825 GiB
of raw text data, requires eight NVIDIA A100-SXM4-40GB
GPUs. This extensive training can last up to 1,830 hours or
approximately 76 days.
Beyond the monetary costs, these models also incur
signi
f
i
cant energy expenses. Predictions indicate a massive
increase in energy usage by platforms employing LLMs [89],
raising environmental concerns. However, a growing body of
research is aimed at mitigating these challenges. For
example, Wang et al. [90] have demonstrated a distillation
method that successfully compresses the MiniLM model to a
mere 66 million parameters, signi
f
i
cantly reducing its size
while maintaining ef
f
i
ciency. Increasing LLM sizes poses a
complex challenge, necessitating ongoing efforts for more
ef
f
i
cient deployment strategies.
B.
CH
A
LL
ENGE
S I
N
LL
M
GENERA
LI
Z
A
BILI
T
Y
Generalizability in LLMs pertains to their capability to
perform tasks accurately and consistently across various
domains, datasets, or functions that differ from their initial
training environment. Although LLMs are often trained on
extensive datasets, encompassing a broad range of
knowledge, their ef
f
i
cacy can be less reliable when applied to
unique or niche tasks outside their primary training scope. This
limitation becomes evident in diverse applications, from
coding projects to document analysis, where the context and
semantics can vary signi
f
i
cantly across different projects,
languages, or domains.
To enhance the generalizability of LLMs, it is crucial to
engage in meticulous
f
i
ne-tuning, apply rigorous validation
across diverse datasets, and establish continuous feedback
mechanisms. These steps are vital to prevent models from
becoming overly specialized in their training data, which
can severely restrict their applicability in various real-world
scenarios. However, despite these precautions, recent studies
indicate that LLMs often need help to extend their high-
performance levels to inputs markedly different from their
training data [91]. This limitation highlights a signi
f
i
cant gap
in the current capabilities of LLMs. The challenge, therefore,
lies in developing LLMs that possess extensive knowledge
and understanding gleaned from large datasets and exhibit
the
f
l
exibilityandadaptabilityrequiredtofunctioneffectively
across a wide range of contexts. Addressing this challenge
involvesre
f
i
ningthetrainingprocessandinnovatinginmodel
architecture and learning algorithms.
C
.
CH
A
LL
ENGE
S I
N
LL
M
I
NTER
P
RETA
BILI
T
Y,
TRU
S
TWORT
H
I
NE
SS,
AND
ET
H
I
C
A
L
U
S
AGE
Interpretability and trustworthiness are pivotal in integrating
LLMs for sentiment analysis tasks. The primary challenge
lies in demystifying the decision-making processes of these
models. Due to their ‘black-box’ nature, elucidating the
mechanisms through which they discern sentiment from text is
often challenging. Recent studies [92] have underscored this
issue, revealing that while LLMs are pro
f
i
cient in
sentiment analysis, their opaque internal workings remain a
signi
f
i
cant barrier. This obscurity in understanding how these
models arrive at their conclusions can generate apprehension
and reluctance among users, particularly investors who rely
on clear and logical reasoning for decision-making [93].
134056
V
OLU
M
E
12
,
2024
C
.
Liu
e
t
al
.
:
La
r
ge
Language
M
odels
and
S
en
t
imen
t
A
nal
y
sis
in
F
inancial
M
a
r
ke
t
s
Investors may only trust the outputs of LLMs with a
transparent understanding of the underlying processes.
To foster trust in LLMs, it is essential to develop and
implement techniques and tools that shed light on the
internal mechanics of these models. Such efforts would
enable developers and users to trace and understand the
rationale behind the outputs generated by LLMs. Improving
interpretability and trustworthiness is a technical necessity
and a step towards broader acceptance and use of LLMs in
sentiment analysis, leading to more ef
f
i
cient and effective
practices in this
f
i
eld [94].
Another aspect contributing to the challenge is the
closed nature of many LLMs. Often, it needs to be more
transparent about what data these models have been trained
on, raising questions about the source training datas quality,
representativeness, and ownership. This lack of transparency
extends to concerns over the ownership of derivative data
produced by the models [95]. Furthermore, the potential
vulnerability of LLMs to various adversarial attacks, where
inputs are maliciously designed to manipulate or confuse
the models, adds another layer of complexity. These risks
emphasize the need for robust security measures and ethical
considerations in developing and deploying LLMs.
I
X
.
FUTU
R
E
O
PP
O
R
TUN
I
T
I
E
S
This section highlights the future opportunities for LLMs in
sentiment analysis. As these models evolve and gain
prominence in academic research, we explore the emerging
trends and potential advancements that could shape their role in
sentiment analysis. This section re
f
l
ects on optimizing
LLMs for greater ef
f
i
ciency and effectiveness, expands their
naturallanguageprocessingcapabilitiestoencompassamore
comprehensive array of input forms, and discusses enhancing
their performance in existing sentiment analysis tasks.
A
.
O
P
T
I
M
I
Z
AT
I
ON
OF
LL
M
FOR
S
ENT
I
MENT
ANA
LYSIS
The
ascent of ChatGPT in academic research highlights its
growing prominence and acceptance in scholarly circles.
Researchers have increasingly favored ChatGPT over other
LLMs and their applications since its release, primarily due
to its computational ef
f
i
ciency, versatility in handling diverse
tasks, and potential for cost-effectiveness [96]. Beyond its
application in sentiment analysis, ChatGPT has spearheaded
an era of enhanced collaboration in the
f
i
nancial sector.
This trend marks a signi
f
i
cant shift towards incorporating
sophisticated natural language understanding into sentiment
analysis[97].Byexaminingtheseevolvingdynamics,wecan
anticipate the future trajectory of LLMs like ChatGPT in
re
f
i
ning and revolutionizing sentiment analysis processes.
These developments indicate the transformative potential of
LLMs in sentiment analysis.
Regarding the utilization of LLMs, the choice between
using commercially available pre-trained models like GPT-4
and opting for open-source alternatives such as LLaMA [51],
LlaMA 2 [52], and Alpaca
7
presents distinct avenues for
7
https://github.com/tatsu-lab/stanford_alpaca
customization in specialized tasks. The critical difference
between these approaches lies in their level of control and
personalization. Despite their proprietary nature, pre-trained
models like GPT-4 enable quick, task-speci
f
i
c adaptations
with minimal data requirements. This approach reduces
computational demands and expedites deployment. In
contrast, open-source frameworks like LLaMA provide a
foundation for extensive tailoring. While these models arrive
pre-trained, they can be further adapted, with organizations
often modifying and retraining them on large-scale datasets
speci
f
i
c to their needs [98]. Although this process demands
substantial computational resources and investment, it allows
for creating models intricately tailored to speci
f
i
c domains.
B.
E
X
P
AND
I
NG
LL
M
’S
N
LP
C
A
P
A
BILI
T
I
E
S I
N
MORE
S
ENT
I
MENT
ANA
LYSIS P
H
A
S
E
S
Throughout our analysis, it became apparent that most data
inputs for LLMs in sentiment analysis were text-based. This
f
i
nding aligns with traditional NLP approaches, yet there
needs to be a noticeable gap in utilizing more diverse and
complex datasets, particularly graph-based ones. Embracing a
more comprehensive array of natural language inputs, such as
spoken language, diagrams, and multimodal data, could
signi
f
i
cantly expand the capabilities of LLMs in capturing
and interpreting varied forms of user sentiment [99].
IntegratingspokenlanguageintoLLMscouldenhanceuser
interactions, enabling the models to process more natural and
contextually rich conversations. This addition would allow
LLMs to understand better nuances in tone, intonation, and
colloquial expressions, which often need to be improved in
text-based communication. Similarly, including diagrams
could provide valuable visual representations of complex
ideas or emotions, offering a unique dimension to sentiment
analysis [100]. Diagrams can be a powerful tool to convey
information that may be dif
f
i
cult to express through words
alone.
Moreover, multimodal inputs that amalgamate text, audio,
and visual elements could lead to a more holistic under-
standing of context. Such a comprehensive approach would
likely result in more accurate and context-sensitive senti-
ment analysis outcomes. For instance, combining textual
data with vocal intonations and facial expressions could
better understand the users emotional state and intentions
[101], [102].
C
.
EN
H
AN
C
I
NG
LL
M
’S P
ERFORMAN
C
E
I
N
E
X
IS
T
I
NG
S
ENT
I
MENT
ANA
LYSIS
TA
SKS
In academic research, establishing a universal and adaptable
evaluation framework for LLMs in sentiment analysis is
becoming increasingly imperative. Such a framework is
essential for conducting systematic and consistent assess-
ments of LLMs, focusing on their performance, ef
f
i
cacy,
and potential limitations. This standardization would serve as
a critical benchmark, enabling researchers to verify the
practical readiness of these models for various applications.
V
OLU
M
E
12
,
2024
134057
C
.
Liu
e
t
al
.
:
La
r
ge
Language
M
odels
and
S
en
t
imen
t
A
nal
y
sis
in
F
inancial
M
a
r
ke
t
s
A standardized evaluation framework would offer a compre-
hensive set of criteria and metrics against which to measure
LLMs, ensuring that their capabilities are accurately and
objectively assessed [103].
In academia, where rigorous analysis and validation are
paramount, the absence of such a framework can lead to
fragmentedandinconsistentevaluationsofLLMs,potentially
impeding their development and adoption. By establishing a
universally accepted framework, researchers can compare
different LLMs on a level playing
f
i
eld, fostering a clearer
understanding of each model’s strengths and areas for
improvement. This framework should ideally encompass a
range of considerations, including accuracy in sentiment
detection, adaptability to different linguistic contexts, com-
putational ef
f
i
ciency, and ethical concerns such as bias and
fairness [104], [105].
Furthermore, a universal evaluation framework would
facilitate responsible LLM adoption in academic research
[106]. It would provide scholars with the tools to decide
which models best suit their research needs and objectives.
X
.
CONC
L
U
SI
ON
In this comprehensive literature review, we adeptly examine
the intersection of LLMs and sentiment analysis within
f
i
nancial markets, providing a detailed exploration of
LLMs’ evolution, application, and future opportunities in
this domain. The review navigates through the intrica-cies
of sentiment analysis, underlining its signi
f
i
cance in
understanding market dynamics and investor behavior. Our
meticulousanalysisofLLMs,mainlytheirdevelopmentfrom
BERT [14] to more sophisticated models like FinBERT [12]
and ChatGPT, reveals these models’ substantial impact on
f
i
nancial sentiment analysis.
The review methodically dissects the role of LLMs in
various
f
i
nancial contexts, from cryptocurrency market pre-
diction to stock price forecasting, showcasing their capability
to extract and interpret complex economic sentiments. The
case study on Bitcoin price and news sentiment further
exempli
f
i
es the practical application of LLMs, reinforcing
that sentiment analysis, powered by advanced language
models, is pivotal in deciphering market trends.
However, the review is open to addressing the challenges
and limitations inherent in the current state of LLMs. Issues
such as the immense computational requirements, dif
f
i
culties
in generalizability and interpretability, and ethical concerns
are thoughtfully discussed, providing a balanced perspective.
We call for more ef
f
i
cient deployment strategies, improved
generalizability, and enhanced interpretability is particularly
compelling, indicating the need for continued innovation in
this
f
i
eld.
Looking to the future, integrating more diverse data types
and establishing a universal evaluation framework are essen-
tialstepstowardenhancingtheef
f
i
cacyofLLMsinsentiment
analysis. The potential expansion of LLM capabilities to
include multimodal data inputs and the implementation of a
standard evaluation framework are highlighted as promising
avenues for research and development.
R
EFE
R
ENCE
S
[1] M. Baker and J. Wurgler, ‘Investor sentiment in the stock market,’
J. Econ. Perspect., vol. 21, no. 2, pp. 129–152, 2007.
[2] P. C. Tetlock, ‘Giving content to investor sentiment: The role of
media in the stock market,’ J. Finance, vol. 62, pp. 1139–1168, Jun.
2007. [Online]. Available: https://onlinelibrary.wiley.com/doi/full/
10.1111/j.1540-6261.2007.01232.x
[3] L. A. Smales, ‘The importance of fear: Investor sentiment and
stock market returns, Appl. Econ., vol. 49, no. 34, pp. 3395–3421,
Jul. 2017. [Online]. Available: https://www.tandfonline.com/doi/abs/
10.1080/00036846.2016.1259754
[4] T. Rao and S. Srivastava. (2012). Analyzing Stock Market Movements
Using Twitter Sentiment Analysis. [Online]. Available: http://dx.doi.org/
10.1109/ASONAM.2012.30 and https://repository.lincoln.ac.uk/articles/
conference_contribution/Analyzing_stock_market_movements_using_T
witter_sentiment_analysis/25165223/2?
f
i
le=44450105
[5] E. Cambria and B. White, ‘Jumping NLP curves: A review of natural
language processing research,IEEE Comput. Intell. Mag., vol. 9, no. 2,
pp. 48–57, May 2014.
[6] V. Ramiah, X. Xu, and I. A. Moosa, ‘‘Neoclassical
f
i
nance, behavioral
f
i
nance and noise traders: A review and assessment of the literature,Int.
Rev. Financial Anal., vol. 41, pp. 89–100, Oct. 2015.
[7] F. Wu, Y. Huang, and Y. Song, ‘Structured microblog sentiment clas-
si
f
i
cation via social context regularization, Neurocomputing, vol. 175,
pp. 599–609, Jan. 2016.
[8] T. Al-Moslmi, S. Gaber, M. Albared, and N. Omar. (2016). Feature
Selection Methods Effects on Machine Learning Approaches in Malay
Sentiment Analysis. [Online]. Available: https://www.researchgate.
net/publication/308968243
[9] R. C. Moore and W. Lewis, ‘Intelligent selection of language model
training data,’ in Proc. ACL Conf. Short Papers, 2010, pp. 220–224.
[10] S. Wu, O. Irsoy, S. Lu, V. Dabravolski, M. Dredze, S. Gehrmann, P.
Kambadur, D. Rosenberg, and G. Mann, ‘BloombergGPT: A large
language model for
f
i
nance,’ 2023, arXiv:2303.17564.
[11] K. Mishev, A. Gjorgjevikj, I. Vodenska, L. T. Chitkushev, and D.
Trajanov, ‘‘Evaluation of sentiment analysis in
f
i
nance: From lexicons to
transformers,IEEE Access, vol. 8, pp. 131662–131682, 2020.
[12] D. Araci, ‘FinBERT: Financial sentiment analysis with pre-trained
language models,’ 2019, arXiv:1908.10063.
[13] P. Seroyizhko, Z. Zhexenova, M. Z. Sha
f
i
q, F. Merizzi, A. Galassi, and F.
Ruggeri, ‘‘A sentiment and emotion annotated dataset for Bitcoin price
forecasting based on Reddit posts, in Proc. 4th Workshop Financial
Technol. Natural Lang. Process. (FinNLP), 2022, pp. 203–210. [Online].
Available: https://aclanthology.org/2022.
f
i
nnlp-1.27
[14] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, ‘BERT: Pre-training of
deep bidirectional transformers for language understanding,in Proc.
Conf. North Amer. Chapter Assoc. Comput. Linguistics, Hum. Lang.
Technol., vol. 1, Oct. 2018, pp. 4171–4186.
[15] J. Kocon, I. Cichecki, O. Kaszyca, M. Kochanek, D. Szydlo, J. Baran,
J. Bielaniewicz, M. Gruza, A. Janz, K. Kanclerz, A. Kocon, B. Koptyra,
W. Mieleszczenko-Kowszewicz, P. Milkowski, M. Oleksy, M. Piasecki,
L. Radlinski, K. Wojtasik, S. Wozniak, and P. Kazienko, ‘ChatGPT:
Jack of all trades, master of none, Inf. Fusion, vol. 99, Nov. 2023,
Art. no. 101861.
[16] M. Chakraborty and S. Subramaniam, ‘Does sentiment impact
cryptocurrency? J. Behav. Finance, vol. 24, no. 2, pp. 202–218,
Apr. 2023. [Online]. Available: https://www.tandfonline.com/doi/abs/
10.1080/15427560.2021.1950723
[17] A. H. Huang, H. Wang, and Y. Yang, ‘FinBERT: A large language model
for extracting information from
f
i
nancial text, Contemp. Accounting
Res., vol. 40, no. 2, pp. 806–841, May 2023. [Online]. Available:
https://onlinelibrary.wiley.com/doi/full/10.1111/1911-3846.12832
[18] H. Tong, J. Li, N. Wu, M. Gong, D. Zhang, and Q. Zhang, ‘Ploutos:
Towards interpretable stock movement prediction with
f
i
nancial large
language model,’ 2024, arXiv:2403.00782.
[19] A. S. George and A. H. George, ‘‘A review of ChatGPT AIs impact on
several business sectors,Partners Universal Int. Innov. J., vol. 1, no. 1,
pp. 9–23, 2023.
134058
V
OLU
M
E
12
,
2024
C
.
Liu
e
t
al
.
:
La
r
ge
Language
M
odels
and
S
en
t
imen
t
A
nal
y
sis
in
F
inancial
M
a
r
ke
t
s
[20] N. A. Sharma, A. B. M. S. Ali, and M. A. Kabir, ‘‘A review of sentiment
analysis: Tasks, applications, and deep learning techniques,Int. J. Data
Sci. Anal., pp. 1–38, Jul. 2024, doi:
10.1007/s41060-024-00594-x.
[21] M. A. K. Raiaan, M. S. H. Mukta, K. Fatema, N. M. Fahad, S. Sakib, M.
M. J. Mim, J. Ahmad, M. E. Ali, and S. Azam, A review on large
language models: Architectures, applications, taxonomies, open issues
and challenges,IEEE Access, vol. 12, pp. 26839–26874, 2024.
[22] B. Chen, Z. Wu, and R. Zhao, ‘From
f
i
ction to fact: The growing role of
generative AI in business and
f
i
nance,J. Chin. Econ. Bus. Stud., vol. 21,
no. 4, pp. 471–496, Oct. 2023.
[23] M. M. Dong, T. C. Stratopoulos, and V. X. Wang, A Scoping Review of
ChatGPTResearchinAccountingandFinance,T.C.WangandV.Xiaoqi,
Eds., Dec. 2023. [Online]. Available: https://ssrn.com/abstract=4680203
and http://dx.doi.org/10.2139/ssrn.4680203
[24] S. A. Farimani, M. V. Jahan, and A. M. Fard, ‘‘From text representation to
f
i
nancial market prediction: A literature review,Information, vol. 13, no.
10, p. 466, Sep. 2022.
[25] A. Koshiyama, N. Firoozye, and P. Treleaven, Algorithms in future
capital markets: A survey on AI, ML and associated algorithms in capital
markets,’’ in Proc. 1st ACM Int. Conf. AI Finance, 2020, pp. 1–8.
[26] O. Bashchenko, ‘Bitcoin price factors: Natural language processing
approach, SSRN Electron. J., vol. 13, pp. 22–48, Mar. 2022. [Online].
Available: https://papers.ssrn.com/abstract=4079091
[27] B. N. Thanh, A. T. Nguyen, T. T. Chu, and S. Ha. (2023). ChatGPT,
Twitter Sentiment and Bitcoin Return. [Online]. Available: https://
papers.ssrn.com/abstract=4628097
[28] B. Kitchenham. (2007). Guidelines for Performing Systematic
Literature Reviews in Software Engineering. [Online]. Available:
https://www.researchgate.net/publication/302924724
[29] B. Kitchenham, L. Madeyski, and D. Budgen, ‘SEGRESS: Software
engineering guidelines for REporting secondary studies, IEEE Trans.
Softw. Eng., vol. 49, no. 3, pp. 1273–1298, Mar. 2023.
[30] H. Zhao, Z. Liu, Z. Wu, Y. Li, T. Yang, P. Shu, S. Xu, H. Dai, L. Zhao, G.
Mai, N. Liu, and T. Liu, ‘‘Revolutionizing
f
i
nance with LLMs: An
overview of applications and insights,’ 2024, arXiv:2401.11641.
[31] K. Du, F. Xing, R. Mao, and E. Cambria, ‘‘Financial sentiment analysis:
Techniques and applications, ACM Comput. Surv., vol. 56, no. 9, pp.
1–42, Oct. 2024.
[32] M. N. Ashtiani and B. Raahemi, ‘News-based intelligent prediction of
f
i
nancial markets using text mining and machine learning: A
systematic literature review,’ Expert Syst. Appl., vol. 217, May 2023,
Art. no. 119509.
[33] M. Shanahan, Talking about large language models, 2022,
arXiv:2212.03551.
[34] J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi,
Q. V. Le, and D. Zhou, ‘‘Chain-of-thought prompting elicits reasoning in
large language models,’ in Proc. Adv. Neural Inf. Process. Syst., vol. 35,
2022, pp. 24824–24837.
[35] R. Taylor, M. Kardas, G. Cucurull, T. Scialom, A. Hartshorn, E. Saravia, A.
Poulton, V. Kerkez, and R. Stojnic, ‘‘Galactica: A large language model
for science,’ 2022, arXiv:2211.09085.
[36] J. Hoffmann et al., ‘Training compute-optimal large language models,
2022, arXiv:2203.15556.
[37] J. Xu Zhao, Y. Xie, K. Kawaguchi, J. He, and M. Q. Xie, Automatic
model selection with large language models for reasoning, 2023,
arXiv:2305.14333.
[38] S. Pan, L. Luo, Y. Wang, C. Chen, J. Wang, and X. Wu, ‘Unifying
large language models and knowledge graphs: A roadmap, 2023,
arXiv:2306.08302.
[39] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut,
‘‘ALBERT: A lite BERT for self-supervised learning of language
representations,’ in Proc. 8th Int. Conf. Learn. Represent. (ICLR), 2020,
pp. 1–17.
[40] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L.
Zettlemoyer, and V. Stoyanov, ‘RoBERTa: A robustly optimized
BERT pretraining approach,’ 2019, arXiv:1907.11692.
[41] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L.
Kaiser, and I. Polosukhin, Attention is all you need,in Proc. Adv.
Neural Inf. Process. Syst., vol. 30, 2017, pp. 1–11.
[42] M. Kulakowski and F. Frasincar, ‘‘Sentiment classi
f
i
cation of
cryptocurrency-related social media posts, IEEE Intell. Syst., vol. 38,
no. 4, pp. 5–9, Jul. 2023.
[43] Y. Yu, H. Li, Z. Chen, Y. Jiang, Y. Li, D. Zhang, R. Liu, J. W. Suchow, and K.
Khashanah, ‘‘FinMem: A performance-enhanced LLM trading agent with
layered memory and character design,’ 2023, arXiv:2311.13743.
[44] Y. Li, Y. Yu, H. Li, Z. Chen, and K. Khashanah, ‘TradingGPT: Multi-
agent system with layered memory and distinct characters for enhanced
f
i
nancial trading performance,’ 2023, arXiv:2309.03736.
[45] A.Radford,J.Wu,R.Child,D.Luan,D.Amodei,andI.Sutskever,‘Lan-
guage models are unsupervised multitask learners,OpenAI Blog, vol. 1,
no. 8, p. 9, 2019. [Online]. Available: https://github.com/codelucas/
newspaper
[46] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, Improving
Language Understanding by Generative Pre-training. Accessed: Feb. 3,
2024. [Online]. Available: https://gluebenchmark.com/leaderboard
[47] T. B. Brown et al., ‘Language models are few-shot learners,’ in
Proc. NIPS, 2020, pp. 1877–1901. [Online]. Available: https://
commoncrawl.org/the-data/
[48] J. Achiam et al., ‘GPT-4 technical report,’ 2023, arXiv:2303.08774.
[49] G. Fatouros, J. Soldatos, K. Kouroumali, G. Makridis, and D. Kyriazis,
‘Transforming sentiment analysis in the
f
i
nancial domain with
ChatGPT,Mach. Learn. Appl., vol. 14, Dec. 2023, Art. no. 100508.
[50] B.Zhang,H.Yang,andX.-Y.Liu,‘‘Instruct-FinGPT:Financialsentiment
analysisbyinstructiontuningofgeneral-purposelargelanguagemodels,
2023, arXiv:2306.12659.
[51] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux,
T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez,
A. Joulin, E. Grave, and G. Lample, ‘LLaMA: Open and ef
f
i
cient
foundation language models,’ 2023, arXiv:2302.13971.
[52] H. Touvron et al., ‘Llama 2: Open foundation and
f
i
ne-tuned chat
models,’ 2023, arXiv:2307.09288.
[53] Q. Xie, W. Han, X. Zhang, Y. Lai, M. Peng, A. Lopez-Lira, and J. Huang,
‘PIXIU: A large language model, instruction data and evaluation
benchmark for
f
i
nance,’ 2023, arXiv:2306.05443.
[54] B. Peng, E. Chersoni, Y.-Y. Hsu, L. Qiu, and C.-R. Huang, ‘Supervised
cross-momentum contrast: Aligning representations with prototypical
examples to enhance
f
i
nancial sentiment analysis,Knowl.-Based Syst.,
vol. 295, Jul. 2024, Art. no. 111683.
[55] C. He, C. Li, T. Han, and L. Shen, Assessing and enhancing LLMs: A
physics and history dataset and one-more-check pipeline method,in
Proc. Int. Conf. Neural Inf. Process., 2024, pp. 504–517.
[56] Z.Chen,W.Chen,C.Smiley,S.Shah,I.Borova,D.Langdon,R. Moussa,
M. Beane, T.-H. Huang, B. Routledge, and W. Y. Wang, ‘FinQA:
A dataset of numerical reasoning over
f
i
nancial data, in Proc. Conf.
Empirical Methods Natural Lang. Process., 2021, pp. 3697–3711.
[57] Z. Liu, D. Huang, K. Huang, Z. Li, and J. Zhao, ‘FinBERT: A pre-trained
f
i
nancial language representation model for
f
i
nancial text mining, in
Proc. 29th Int. Joint Conf. Artif. Intell., Jul. 2020, pp. 4513–4519.
[Online]. Available: http://commoncrawl.org/
[58] P. Malo, A. Sinha, P. Korhonen, J. Wallenius, and P. Takala, ‘‘Good debt or
bad debt: Detecting semantic orientations in economic texts,’J. Assoc. Inf.
Sci. Technol., vol. 65, no. 4, pp. 782–796, Apr. 2014.
[59] K. Cortis, A. Freitas, T. Daudert, M. Huerlimann, M. Zarrouk, S.
Handschuh, and B. Davis, ‘SemEval-2017 task 5: Fine-grained
sentiment analysis on
f
i
nancial microblogs and news,’ in Proc. 11th Int.
Workshop Semantic Eval. (SemEval), 2017, pp. 519–535.
[60] G. Kim, M. Kim, B. Kim, and H. Lim, ‘‘CBITS: Crypto BERT incorpo-
rated trading system,IEEE Access, vol. 11, pp. 6912–6921, 2023.
[61] Y. Zou and D. Herremans, ‘PreBit—A multimodal model with Twitter
FinBERT embeddings for extreme price movement prediction of
Bitcoin,Expert Syst. Appl., vol. 233, Dec. 2023, Art. no. 120838.
[62] A. Raheman, A. Kolonin, I. Fridkins, I. Ansari, and M. Vishwas, ‘Social
media sentiment analysis for cryptocurrency market prediction, 2022,
arXiv:2204.10185.
[63] M. Ortu, N. Uras, C. Conversano, S. Bartolucci, and G. Destefanis,
‘On technical trading and social media indicators for cryptocurrency
price classi
f
i
cation through deep learning,Expert Syst. Appl., vol. 198,
Jul. 2022, Art. no. 116804.
[64] B. Fazlija and P. Harder, ‘Using
f
i
nancial news sentiment for stock price
direction prediction,Mathematics, vol. 10, no. 13, p. 2156, Jun. 2022.
[Online]. Available: https://www.mdpi.com/2227-7390/10/13/2156/htm
[65] U. Gupta, ‘GPT-InvestAR: Enhancing stock investment strategies
through annual report analysis with large language models, 2023,
arXiv:2309.03079.
V
OLU
M
E
12
,
2024
134059
C
.
Liu
e
t
al
.
:
La
r
ge
Language
M
odels
and
S
en
t
imen
t
A
nal
y
sis
in
F
inancial
M
a
r
ke
t
s
[66] X. Deng, V. Bashlovkina, F. Han, S. Baumgartner, and M. Bendersky,
‘What do LLMs know about
f
i
nancial markets? A case study on Reddit
market sentiment analysis,’ 2022, arXiv:2212.11311.
[67] H. Q. Abonizio, E. C. Paraiso, and S. Barbon, Toward text data
augmentation for sentiment analysis,IEEE Trans. Artif. Intell., vol. 3,
no. 5, pp. 657–668, Oct. 2022.
[68] D. Ider and S. Lessmann, ‘‘Forecasting cryptocurrency returns from sen-
timent signals: An analysis of BERT classi
f
i
ers and weak supervision,
2022, arXiv:2204.05781.
[69] J. de Curtò, I. de Zarzà, G. Roig, J. C. Cano, P. Manzoni, and C. T.
Calafate, ‘‘LLM-informed multi-armed bandit strategies for non-
stationaryenvironments,Electronics,vol.12,no.13,p. 2814,Jun.2023.
[Online]. Available: https://www.mdpi.com/2079-9292/12/13/2814/htm
[70] M. Fernandes, S. Khanna, L. Monteiro, A. Thomas, and G. Tripathi,
‘Bitcoin price prediction,in Proc. Int. Conf. Adv. Comput., Commun.,
Control (ICAC3), Dec. 2021, pp. 1–4.
[71] P. Malo, A. Sinha, P. Takala, P. Korhonen, and J. Wallenius, ‘‘Good debt or
bad debt: Detecting semantic orientations in economic texts, 2013,
arXiv:1307.5336.
[72] S. A. Farimani, M. V. Jahan, A. M. Fard, and S. R. K. Tabbakh,
‘Investigating the informativeness of technical indicators and news
sentiment in
f
i
nancial market price prediction, Knowl.-Based Syst.,
vol. 247, Jul. 2022, Art. no. 108742.
[73] T. C. Chiang, B. N. Jeon, and H. Li, ‘Dynamic correlation analysis of
f
i
nancial contagion: Evidence from Asian markets, J. Int. Money
Finance, vol. 26, no. 7, pp. 1206–1228, Nov. 2007.
[74] K. J. Forbes and R. Rigobon, ‘No contagion, only interdependence:
Measuring stock market comovements, J. Finance, vol. 57,
no. 5, pp. 2223–2261, Oct. 2002. [Online]. Available: https://
onlinelibrary.wiley.com/doi/full/10.1111/0022-1082.00494
[75] L. Barnett, A. B. Barrett, and A. K. Seth, ‘Granger causality and transfer
entropy are equivalent for Gaussian variables,Phys. Rev. Lett., vol. 103,
no. 23, Dec. 2009, Art. no. 238701.
[76] C. E. Shannon, ‘‘A mathematical theory of communication, Bell Syst.
Tech. J., vol. 27, no. 3, pp. 379–423, Jul. 1948.
[77] S. Kullback and R. A. Leibler, ‘On information and suf
f
i
ciency,’Ann.
Math. Statist., vol. 22, no. 1, pp. 79–86, 1951.
[78] T. Schreiber, ‘Measuring information transfer,’Phys. Rev. Lett., vol. 85,
no. 2, pp. 461–464, Jul. 2000.
[79] T. Dimp
f
l
and F. J. Peter, ‘Using transfer entropy to measure information
f
l
ows between
f
i
nancial markets, Stud. Nonlinear Dyn. Econometrics,
vol. 17, no. 1, pp. 85–102, 2013.
[80] S. Bekiros, D. K. Nguyen, L. S. Junior, and G. S. Uddin, ‘‘Information
diffusion, cluster formation and entropy-based network dynamics in
equity and commodity markets, Eur. J. Oper. Res., vol. 256, no. 3, pp.
945–961, Feb. 2017.
[81] T. A. Brown, Confirmatory Factor Analysis for Applied Research. NY,
USA: Guilford publications, 2015.
[82] P. Wollstadt, M. Martínez-Zarzuela, R. Vicente, F. J. Díaz-Pernas, and M.
Wibral, ‘‘Ef
f
i
cient transfer entropy analysis of non-stationary neural time
series,PLoS ONE, vol. 9, no. 7, Jul. 2014, Art. no. e102833.
[83] H. Zhang, H. Hong, Y. Guo, and C. Yang, ‘‘Information spillover effects
frommediacoveragetothecrudeoil,gold,andBitcoinmarketsduringthe
COVID-19 pandemic: Evidence from the time and frequency domains,
Int. Rev. Econ. Finance, vol. 78, pp. 267–285, Mar. 2022.
[84] S.Moss,‘‘Googlebrainunveilstrillion-parameterAIlanguagemodel,the
largest yet,’ Tech. Rep., 2021.
[85] S. Bekman, ‘The technology behind Bloom training,Tech. Rep., 2022.
[86] T. L. Scao et al., ‘BLOOM: A 176B-parameter open-access multilingual
language model,’ 2022, arXiv:2211.05100.
[87] S. Black, S. Biderman, E. Hallahan, Q. Anthony, L. Gao, L. Golding,
H. He, C. Leahy, K. McDonell, J. Phang, M. Pieler, U. S. Prashanth,
S. Purohit, L. Reynolds, J. Tow, B. Wang, and S. Weinbach,
‘GPT-NeoX-20B: An open-source autoregressive language model,
2022, arXiv:2204.06745.
[88] L. Gao, S. Biderman, S. Black, L. Golding, T. Hoppe, C. Foster, J. Phang,
H. He, A. Thite, N. Nabeshima, S. Presser, and C. Leahy, ‘The Pile:
An 800 GB dataset of diverse text for language modeling, 2021,
arXiv:2101.00027.
[89] M. C. Rillig, M. Ågerstrand, M. Bi, K. A. Gould, and U. Sauerland,
‘Risks and bene
f
i
ts of large language models for the environment,
Environ.Sci.Technol.,vol.57,no.9,pp. 3464–3466,Mar.2023.[Online].
Available: https://pubs.acs.org/doi/full/10.1021/acs.est.3c01106
[90] W. Wang, F. Wei, L. Dong, H. Bao, N. Yang, and M. Zhou, ‘‘MiniLM:
Deep self-attention distillation for task-agnostic compression of pre-
trained transformers,’ in Proc. Adv. Neural Inf. Process. Syst., vol. 2020,
2020, pp. 5776–5788.
[91] A. Albalak, A. Shrivastava, C. Sankar, A. Sagar, and M. Ross, ‘Data-
ef
f
i
ciency with a single GPU: An exploration of transfer methods for
small language models,’ 2022, arXiv:2210.03871.
[92] X. Deng, V. Bashlovkina, F. Han, S. Baumgartner, and M. Bendersky,
‘What do LLMs Know about
f
i
nancial markets? A case study on Reddit
marketsentimentanalysis,inProc.ACMWebConf.,2022,pp. 107–110.
[Online]. Available: https://dl.acm.org/doi/10.1145/3543873.3587324
[93] S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y. Cao,
‘ReAct: Synergizing reasoning and acting in language models, 2022,
arXiv:2210.03629.
[94] X. Li, H. Xiong, X. Li, X. Wu, X. Zhang, J. Liu, J. Bian, and
D. Dou, ‘Interpretable deep learning: Interpretation, interpretability,
trustworthiness, and beyond, Knowl. Inf. Syst., vol. 64, no. 12,
pp. 3197–3234, Dec. 2022. [Online]. Available: https://link.springer.
com/article/10.1007/s10115-022-01756-8
[95] S. Sinha, H. Chen, A. Sekhon, Y. Ji, and Y. Qi, ‘Perturbing inputs
for fragile interpretations in deep natural language pro-cessing, in
Proc. 4th BlackboxNLP Workshop Analyzing Inter-preting Neural
Netw. NLP, 2021, pp. 420–434. [Online]. Available:
https://aclanthology.org/2021.blackboxnlp-1.33
[96] M. T. R. Laskar, M. S. Bari, M. Rahman, M. A. H. Bhuiyan, S. Joty,
and J. X. Huang, A systematic study and comprehensive evaluation
of ChatGPT on benchmark datasets, in Proc. Annu. Meeting Assoc.
Comput. Linguistics, 2023, pp. 431–469.
[97] M. U. Haque, I. Dharmadasa, Z. T. Sworna, R. N. Rajapakse, and H.
Ahmad, ‘‘‘I think this is the most disruptive technology’: Exploring
sentiments of ChatGPT early adopters using Twitter data, 2022,
arXiv:2212.05856.
[98] I. Gur, O. Nachum, Y. Miao, M. Safdari, A. Huang, A. Chowdhery, S.
Narang, N. Fiedel, and A. Faust, ‘Understanding HTML with large
language models,’ 2022, arXiv:2210.03945.
[99] X. Wang, J. He, Z. Jin, M. Yang, Y. Wang, and H. Qu, ‘M2Lens:
Visualizing and explaining multimodal models for sentiment analysis,
IEEE Trans. Vis. Comput. Graphics, vol. 28, no. 1, pp. 802–812, Jan.
2022.
[100] H. Song, J. Li, Z. Xia, Z. Yang, and X. Du, ‘‘Multimodal sentiment
analysis based on pre-LN transformer interaction, in Proc. IEEE 6th
Inf. Technol. Mechatronics Eng. Conf. (ITOEC), vol. 6, Mar. 2022,
pp. 1609–1613.
[101] K. Dashtipour, M. Gogate, E. Cambria, and A. Hussain, A novel
context-aware multimodal framework for Persian sentiment analysis,
Neurocomputing, vol. 457, pp. 377–388, Oct. 2021.
[102] U.Sehar,S.Kanwal,K.Dashtipur,U.Mir,U.Abbasi,andF.Khan,‘‘Urdu
sentiment analysis via multimodal data mining based on deep learning
algorithms,IEEE Access, vol. 9, pp. 153072–153082, 2021.
[103] A. Oussous, F.-Z. Benjelloun, A. A. Lahcen, and S. Belfkih,
‘‘ASA: A framework for Arabic sentiment analysis, J. Inf. Sci.,
vol. 46, no. 4, pp. 544–559, Aug. 2020. [Online]. Available:
https://journals.sagepub.com/doi/10.1177/0165551519849516
[104] Z. Ke, J. Sheng, Z. Li, W. Silamu, and Q. Guo, ‘‘Knowledge-guided
sentiment analysis via learning from natural language explanations,
IEEE Access, vol. 9, pp. 3570–3578, 2021.
[105] Q. Zhang, J. Zhou, Q. Chen, Q. Bai, J. Xiao, and L. He, A
knowledge-enhanced adversarial model for cross-lingual structured
sentiment analysis, in Proc. Int. Joint Conf. Neural Netw., Jul. 2022,
pp. 1–8.
[106] G. F. N. Mvondo, B. Niu, and S. Eivazinezhad, ‘Generative con-
versational AI and academic integrity: A mixed method investigation
to understand the ethical use of LLM chatbots in higher educa-
tion, SSRN Electron. J., 2023. [Online]. Available: https://ssrn.com/
abstract=4548263 and http://dx.doi.org/10.2139/ssrn.4548263
134060
V
OLU
M
E
12
,
2024
C
.
Liu
e
t
al
.
:
La
r
ge
Language
M
odels
and
S
en
t
imen
t
A
nal
y
sis
in
F
inancial
M
a
r
ke
t
s
C
H
ENG
H
AO
LI
U
received the
B.Sc. degree in
software engineering from Jiangxi University of
Finance and Economics, in 2020. He is currently
pursuing the master’s degree with The Univer-
sity of Auckland. His research interests include
machine learning and large language model to
solve
f
i
nancial problems.
ARUN
K
UMAR
ARU
L
A
PP
AN
(Member, IEEE)
received the B.Tech. degree in information tech-
nology from Anna University, Chennai, India,
the M.Tech. degree in computer science and
engineering from Vellore Institute of Technology
(VIT), Vellore, India, and the Ph.D. degree from
the Faculty of Information and Communication
Engineering, Anna University, in 2023. He is an
Assistant Professor with the School of Computer
Science Engineering and Information Systems
(SCORE), VIT University. He is pro
f
i
cient with simulator tools MATLAB,
ns-3, Mininet, OpenNet VM, and P4 programming. He is exposed to open
source tools, such as OpenStack, Cloudify, OPNFV, and Cloud-Native
Computing Foundation (CNCF). His research interests include the cloud-
native deployment, SDN, NFV, 5G/6G networks, AI/ML based networking,
the Internet of Vehicles, and UAV communications.
RANE
S
H
NA
H
A
(Member, IEEE) received the
M.Sc. degree in parallel and distributed com-
puting from Universiti Putra Malaysia, and the
Ph.D. degree in information technology from
the University of Tasmania, Australia. He is a
Senior Lecturer of information systems with
Queensland University of Technology (QUT).
He has authored more than 50 peer-reviewed
scienti
f
i
c research articles. His research interests
include distributed computing (fog/edge/cloud),
the Internet of Things (IoT), AI and ML, software-de
f
i
ned networking
(SDN), cybersecurity, and blockchain.
AN
IK
ET
MA
H
ANT
I
(Senior Member, IEEE)
received the B.Sc. degree (Hons.) in computer
science from the University of New Brunswick,
Canada, and the M.Sc. and Ph.D. degrees in
computer science from the University of Calgary,
Canada.HeisaSeniorLecturer(anAssociatePro-
fessor) of computer science with The University of
Auckland, New Zealand. His research interests
include network science, distributed systems, and
internet measurements.
JOARDER
K
AMRU
ZZ
AMAN
(Senior Member,
IEEE) received the B.Sc. and M.Sc. degrees in
electrical and electronic engineering from
Bangladesh University of Engineering and Tech-
nology, Dhaka, and the Ph.D. degree in infor-
mation systems engineering from the Muroran
Institute of Technology, Hokkaido, Japan.
He is a Professor of information technology
and the Director of the Centre for Smart Analyt-
ics,
Federation
University
Australia.
Previously,
he was with Monash University, Australia, as an Associate Professor; and
Bangladesh University of Engineering and Technology, as a Professor. He
has been listed in Stanford’s Top 2% Scientists list, since 2020. He has
published more than 300 peer-reviewed articles, which include over 110
journals and 180 conference papers. His publications are cited over 7300
times and have an H-index of 36, a g-index of 79, and an i-10 index of 115.
He has received over A$5.0m in competitive research funding, including
a highly prestigious Australian Research Council Grant and Large
Collaborative Research Centre Grants. His research interests include the
InternetofThings,machinelearning,andcybersecurity.Hewasarecipientof
the Best Paper Award in four international conferences, such as ICICS’15,
Singapore; APCC’14, Thailand; IEEE WCNC’10, Sydney, Australia; and
IEEE-ICNNSP’03, Nanjing, China. He has served many conferences in
leadership capacities, including the program co-chair, the publicity chair, the
track chair, and the session chair. Since 2012, he has been an Editor of the
Journal of Network and Computer Applications (Elsevier). He served as the
Lead Guest Editor for Journal Future Generation Computer Systems
(Elsevier).
I
N
-
H
O
RA
(Member, IEEE) received the Ph.D.
degree in computer engineering from Chung-Ang
University, Seoul, South Korea, in 1995. From
February 2007 to August 2008, he was a Visiting
Scholar with the University of South Florida,
Tampa, FL, USA. He has been with the School of
Computer, Information and Communication
Engineering, Kunsan National University, where
he is currently a Professor. His research interests
include wireless ad hoc and sensor networks,
blockchain, the IoT, PS-LTE, and microgrids.
V
OLU
M
E
12
,
2024
134061